My Note
Shallow neural network
1.The form after vectorization
Forward propagation:
\(\begin{aligned} &z^{[1](1)} = W^{[1](1)}x + b^{[1](1)}\\ &a^{[1](1)} = \sigma(z^{[1](1)})\\ &z^{[2](1)} = W^{[2](1)}a^{[1](1)} + b^{[2](1)}\\ &a^{[2](1)} = \sigma(z^{[2](1)}) \end{aligned}\)
The “ [ i ] “ refers to the “i”th hidden layer of this network.
The “ ( j ) “ refers to the “j”th training set.
$X/A/Z:\;\leftrightarrow{\text{The number of trainig sets}}$;
$\updownarrow{\text{The number of features/hidden units}}$
$W:\;\leftrightarrow{\text{The number of previous features/hidden units}}$;
$\updownarrow{\text{The number of next hidden units}}$
import numpy as np
# A shabby demonstration of a simple neural network with one hidden layer
# Traning sets: 5
# Input features: 3
# Hidden layer neurons: 4
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def ReLU(x):
return np.maximum(0, x)
W1 = np.random.rand(4,3) #random weights for layer 1
x = np.array([[4,8,87],[4,-114,5],[2,-7,7],[4,8,2],[0,48,-7]]).T
b = np.array([0.00345])
a = ReLU(np.dot(W1, x) + b)
W2 = np.random.rand(1,4)
y = sigmoid(np.dot(W2, a) + b)
print(y)
[[1. 0.56647244 0.96815955 0.99999989 1. ]]
2.The selection of the activation function
- Sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$
- Tanh function: $tanh(z) = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$ (The mean is close to “0”)
- ReLU function: $ReLU(z) = max(0,z)$ (Avoiding the diminishing of the gradient)
- Leaky ReLU
Activation function could be different for different layers!
3.Gradient descent for shallow neural network
Back propagation:
\(\begin{aligned} &dz^{[2]} = A^{[2]} - Y\\ &dw^{[2]} = \frac{1}{m}dz^{[2]}A^{[1]T}\\ &db^{[2]} = \frac{1}{m}\text{np.sum}(dz^{[2]},\text{axis} = 1, \text{keepdims} = True)\\ &dz^{[1]} = dw^{[2]T}dz^{[2]}*g{[1]'}(z^{[1]})\\ &dw^{[1]} = \frac{1}{m}dz^{[1]}X^{T}\\ &db^{[2]} = \frac{1}{m}\text{np.sum}(dz^{[1]},\text{axis} = 1, \text{keepdims} = True)\\ \end{aligned}\)
