My Note

Shallow neural network

1.The form after vectorization

Forward propagation:

$\begin{aligned} &z^{[1](1)} = W^{[1](1)}x + b^{[1](1)}\\ &a^{[1](1)} = \sigma(z^{[1](1)})\\ &z^{[2](1)} = W^{[2](1)}a^{[1](1)} + b^{[2](1)}\\ &a^{[2](1)} = \sigma(z^{[2](1)}) \end{aligned}$

The “ [ i ] “ refers to the “i”th hidden layer of this network.
The “ ( j ) “ refers to the “j”th training set.

$X/A/Z:\;\leftrightarrow{\text{The number of trainig sets}}$;

$\updownarrow{\text{The number of features/hidden units}}$

$W:\;\leftrightarrow{\text{The number of previous features/hidden units}}$;

$\updownarrow{\text{The number of next hidden units}}$

import numpy as np

# A shabby demonstration of a simple neural network with one hidden layer
# Traning sets: 5
# Input features: 3
# Hidden layer neurons: 4

def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def ReLU(x):
    return np.maximum(0, x)
W1 = np.random.rand(4,3)  #random weights for layer 1
x = np.array([[4,8,87],[4,-114,5],[2,-7,7],[4,8,2],[0,48,-7]]).T
b = np.array([0.00345])
a = ReLU(np.dot(W1, x) + b) 
W2 = np.random.rand(1,4)
y = sigmoid(np.dot(W2, a) + b)
print(y)

[[1.         0.56647244 0.96815955 0.99999989 1.        ]]

2.The selection of the activation function

Sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$
Tanh function: $tanh(z) = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$ (The mean is close to “0”)
ReLU function: $ReLU(z) = max(0,z)$ (Avoiding the diminishing of the gradient)
Leaky ReLU

Activation function could be different for different layers!

3.Gradient descent for shallow neural network

Back propagation:

$\begin{aligned} &dz^{[2]} = A^{[2]} - Y\\ &dw^{[2]} = \frac{1}{m}dz^{[2]}A^{[1]T}\\ &db^{[2]} = \frac{1}{m}\text{np.sum}(dz^{[2]},\text{axis} = 1, \text{keepdims} = True)\\ &dz^{[1]} = dw^{[2]T}dz^{[2]}*g{[1]'}(z^{[1]})\\ &dw^{[1]} = \frac{1}{m}dz^{[1]}X^{T}\\ &db^{[2]} = \frac{1}{m}\text{np.sum}(dz^{[1]},\text{axis} = 1, \text{keepdims} = True)\\ \end{aligned}$

Junjia Zhang

Shallow neural network

1.The form after vectorization

2.The selection of the activation function

3.Gradient descent for shallow neural network