01-梯度下降¶

机器学习的几个关键步骤¶

梯度下降算法实现（一次梯度下降，更新一次m和b）
训练达到梯度最低点（多次的更新m和b，直到达到最低点）
测试训练结果准确性（用测试集的数据，来测试m和b预测的准确性）
进行新的预测（来进行新的预测）

重新用矩阵进行梯度下降¶

回顾：

两个矩阵如何相乘

矩阵相乘的结果是什么样的

import numpy as np
data = np.array([
    [80,200],
    [95,230],
    [104,245],
    [112,247],
    [125,259],
     [135,262]
])

矩阵能不能乘。矩阵乘出来是什么结果。1.

Feature矩阵

$\begin{bmatrix} Area_1 \\ Area_2 \\ Area_3 \\ Area_4 \\ Area_5 \\ Area_6\end{bmatrix}$

矩阵的形状是（6,1）Weight矩阵，权重矩阵

$\begin{bmatrix} m \\ b\end{bmatrix}$ 矩阵的形状是（2,1)

两个矩阵形状不同，不能相乘

形状不一样的原因是，房屋面积是变量 $x$ 与 $m$ 相乘， b是偏移量，应该与常量1相乘

$\begin{bmatrix} Area_1 & 1\\ Area_2 & 1\\ Area_3 & 1\\ Area_4 & 1\\ Area_5 & 1\\ Area_6& 1\end{bmatrix} × \begin{bmatrix} m \\ b\end{bmatrix} = \begin{bmatrix} Area_1*m+b \\ Area_2*m+b \\ Area_3*m+b \\ Area_4*m+b \\ Area_5*m+b \\ Area_6*m+b\end{bmatrix}$

直接Feature x Weight 就是预测的房价。

$Feature × Weight - Label = \begin{bmatrix} Area_1*m+b - Actual_1 \\ Area_2*m+b - Actual_2\\ Area_3*m+b - Actual_3 \\ Area_4*m+b - Actual_4\\ Area_5*m+b - Actual_5\\ Area_6*m+b - Actual_6\end{bmatrix} = \begin{bmatrix} Guess_1 - Actual_1 \\ Guess_2 - Actual_2\\ Guess_3 - Actual_3 \\ Guess_4 - Actual_4\\ Guess_5 - Actual_5\\ Guess_6 - Actual_6\end{bmatrix}$

$Feature.T = \begin{bmatrix} Area_1 & Area_2 & Area_3 & Area_4 & Area_5 & Area_6 \\1 & 1 & 1 & 1 & 1 & 1 &\end{bmatrix}$

$Feature.T * (Feature * Weight - Label) = \begin {bmatrix} \displaystyle \sum_{i=1}^{6} Area_i \left( Guess_i - Actual_{i}\right) \\ \displaystyle \sum_{i=1}^{6} \left(Guess_i- Actual_{i} \right) \end {bmatrix}$

矩阵运算的前两行，分别是 $mse$ 对 $m$ 和 $b$ 的偏导数

$\dfrac {MSE}{\Delta b} = \displaystyle \sum_{i=1}^{10} \left(- 2 Actual_{i} + 2 b + 2 m x\right)$

$\dfrac {MSE}{\Delta m} = \displaystyle \sum_{i=1}^{10} 2 x \left(- Actual_{i} + b + m x\right)$

矩阵运算梯度下降公式¶

$MSE对m和b的偏导数 = Feature.T * (Feature * Weight - Label)$

矩阵运算实现梯度下降¶

import numpy as np
data = np.array([
    [80,200],
    [95,230],
    [104,245],
    [112,247],
    [125,259],
     [135,262]
])

1	`feature = data[:,0:1]`

1	`ones = np.ones((len(feature),1))`

1	`Feature = np.hstack((feature ,ones))`

1	`Label = data[:,-1:]`

1 2	`weight = np.ones((2,1)) weight`

1	`np.dot(Feature.T,(np.dot(Feature,weight)-Label))`

1	`learningrate = 0.00001`

1
2
3

def gradentdecent():
    global weight
    weight = weight - learningrate* np.dot(Feature.T,(np.dot(Feature,weight)-Label))

%%time
count = 0
for i in range(10000000):
    gradentdecent()
    count +=1
    if count % 1000000 == 1:
        print(weight)

1 2	`[ 1.08593592] [122.67595099]`