01-梯度下降

机器学习的几个关键步骤

  1. 梯度下降算法实现 (一次梯度下降,更新一次m和b)
  2. 训练达到梯度最低点 (多次的更新m和b,直到达到最低点)
  3. 测试训练结果准确性 (用测试集的数据,来测试m和b预测的准确性)
  4. 进行新的预测 (来进行新的预测)

重新用矩阵进行梯度下降

回顾:

两个矩阵如何相乘

矩阵相乘的结果是什么样的

1
2
3
4
5
6
7
8
9
import numpy as np
data = np.array([
    [80,200],
    [95,230],
    [104,245],
    [112,247],
    [125,259],
     [135,262]
])

矩阵能不能乘。矩阵乘出来是什么结果。1.

Feature矩阵

\begin{bmatrix} Area_1 \\ Area_2 \\ Area_3 \\ Area_4 \\ Area_5 \\ Area_6\end{bmatrix}

矩阵的形状是 (6,1)Weight矩阵,权重矩阵

\begin{bmatrix} m \\ b\end{bmatrix}矩阵的形状是 (2,1)

两个矩阵形状不同,不能相乘

形状不一样的原因是, 房屋面积是变量xm 相乘, b是偏移量, 应该与常量1相乘

\begin{bmatrix} Area_1 & 1\\ Area_2 & 1\\ Area_3 & 1\\ Area_4 & 1\\ Area_5 & 1\\ Area_6& 1\end{bmatrix} × \begin{bmatrix} m \\ b\end{bmatrix} = \begin{bmatrix} Area_1*m+b \\ Area_2*m+b \\ Area_3*m+b \\ Area_4*m+b \\ Area_5*m+b \\ Area_6*m+b\end{bmatrix}

直接Feature x Weight 就是预测的房价。

Feature × Weight - Label = \begin{bmatrix} Area_1*m+b - Actual_1 \\ Area_2*m+b - Actual_2\\ Area_3*m+b - Actual_3 \\ Area_4*m+b - Actual_4\\ Area_5*m+b - Actual_5\\ Area_6*m+b - Actual_6\end{bmatrix} = \begin{bmatrix} Guess_1 - Actual_1 \\ Guess_2 - Actual_2\\ Guess_3 - Actual_3 \\ Guess_4 - Actual_4\\ Guess_5 - Actual_5\\ Guess_6 - Actual_6\end{bmatrix}
Feature.T = \begin{bmatrix} Area_1 & Area_2 & Area_3 & Area_4 & Area_5 & Area_6 \\1 & 1 & 1 & 1 & 1 & 1 &\end{bmatrix}
Feature.T * (Feature * Weight - Label) = \begin {bmatrix} \displaystyle \sum_{i=1}^{6} Area_i \left( Guess_i - Actual_{i}\right) \\ \displaystyle \sum_{i=1}^{6} \left(Guess_i- Actual_{i} \right) \end {bmatrix}

矩阵运算的前两行,分别是 msemb 的偏导数

\dfrac {MSE}{\Delta b} = \displaystyle \sum_{i=1}^{10} \left(- 2 Actual_{i} + 2 b + 2 m x\right)
\dfrac {MSE}{\Delta m} = \displaystyle \sum_{i=1}^{10} 2 x \left(- Actual_{i} + b + m x\right)

矩阵运算梯度下降公式

MSE对m和b的偏导数 = Feature.T * (Feature * Weight - Label)

矩阵运算实现梯度下降

1
2
3
4
5
6
7
8
9
import numpy as np
data = np.array([
    [80,200],
    [95,230],
    [104,245],
    [112,247],
    [125,259],
     [135,262]
])
1
feature = data[:,0:1]
1
ones = np.ones((len(feature),1))
1
Feature = np.hstack((feature ,ones))
1
Label = data[:,-1:]
1
2
weight = np.ones((2,1))
weight
1
np.dot(Feature.T,(np.dot(Feature,weight)-Label))
1
learningrate = 0.00001
1
2
3
def gradentdecent():
    global weight
    weight = weight - learningrate* np.dot(Feature.T,(np.dot(Feature,weight)-Label))
1
2
3
4
5
6
7
%%time
count = 0
for i in range(10000000):
    gradentdecent()
    count +=1
    if count % 1000000 == 1:
        print(weight)
1
2
[  1.08593592]
 [122.67595099]