基于国立台湾大学李宏毅机器学习的课程笔记

前期工作

cuda

Compute Unified Device Architecture (CUDA):简单地说,就是允许软件调用gpu来计算的一个接口
CUDA Runtime API vs. CUDA Driver API

  • 驱动版本需要≥运行时api版本
  • driver user-space modules需要和driver kernel modules版本一致
  • 当我们谈论cuda时,往往是说runtime api

以下是nvida的介绍原文:

It is composed of two APIs:

  • A low-level API called the CUDA driver API,
  • A higher-level API called the CUDA runtime API that is implemented on top of the CUDA driver API.

The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2.5), so applications that link to this code must use the CUDA runtime API.

In contrast, the CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it only deals with cubin objects (see Section 4.2.5). In particular, it is more difficult to configure and launch kernels using the CUDA driver API, since the execution configuration and kernel parameters must be specified with explicit function calls instead of the execution configuration syntax described in Section 4.2.3. Also, device emulation (see Section 4.5.2.9) does not work with the CUDA driver API.

简单地说,driver更底层,更抽象但性能和自由度更好,runtime则相反

容器


infrastructure(基础设施)
简单地说,虚拟机的隔离级别比容器更高,虚拟机会模拟出一个系统及其系统api,而docker依旧调用宿主机的api,因此docker更为轻量级
docker是处理复杂环境问题的良策,比虚拟机更为轻量
其他常用的容器:Slurm and Kubernetes

Docker Hub repository of PyTorch

理论

官方文档 pytorch-for-numpy-users

张量tensor:用于表示n维数据的一种概念,例如一维张量是向量,二维是矩阵……
dim in PyTorch == axis in NumPy

1
2
3
import torch
import numpy as np

1
2
3
4
5
6
7
8
9
10
def test():
print("", torch.cuda.is_available())
if torch.cuda.is_available():
device = torch.device("cuda")
print(f"There are {torch.cuda.device_count()} GPU(s) available.")
print("Device name:", torch.cuda.get_device_name(0))
else:
print("No GPU available, using the CPU instead.")
device = torch.device("cpu")
test()
 True
There are 1 GPU(s) available.
Device name: NVIDIA GeForce RTX 2070

以下是一些朴素的张量操作:

1
2
3
4
x = torch.tensor([[1, -1], [-1, 1]])
print(x)
x = torch.from_numpy(np.array([[1, -1], [-1, 1]]))
print(x)
tensor([[ 1, -1],
        [-1,  1]])
tensor([[ 1, -1],
        [-1,  1]])
1
2
3
4
x = torch.zeros([2, 2])
print(x)
x = torch.ones([1, 2, 5])
print(x)
tensor([[0., 0.],
        [0., 0.]])
tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])
1
2
3
4
5
x = torch.zeros([2, 3])
print(x.shape)

x = x.transpose(0, 1)
print(x.shape)
torch.Size([2, 3])  
torch.Size([3, 2])

对每一层神经网络,输入的x乘以权重向量w(eight),加上一个标量b(ias)后就是输出y
下图中我们训练一个将32维向量转化为64维向量输出的模型,因此权值矩阵规模是64×32,输入向量是32×1,输出是64×1
算出线性的权值和之后,还可以增加一层激活函数Activation Function

  • Sigmoid函数也叫Logistic函数,用于隐层神经元输出,取值范围为(0,1),它可以将一个实数映射到(0,1)的区间,可以用来做二分类。在特征相差比较复杂或是相差不是特别大时效果比较好,图像类似一个S形曲线:
    • $ f(x)=\frac{1}{1+e^{-x}} $
  • ReLU函数又称为修正线性单元(Rectified Linear Unit),是一种分段线性函数,弥补了sigmoid函数的梯度消失问题:
    • $f(x)={\left\{\begin{array}{l l}{x}&{,x\gt =0}\\ {0}&{,x\lt 0}\end{array}\right.} $

损失函数:

  • Mean Squared Error
  • Cross Entropy