My Note

Notations for deeplearning

1.Data representation

$\begin{pmatrix} x \ y \end{pmatrix}$: $x \in \mathbb{R}^{n_x}, y \in {0, 1}$

m training examples: ${(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \dots, (x^{(m)}, y^{(m)})}$

$m = m_{\text{train}}$

\[X = \begin{bmatrix} x^{(1)} & x^{(2)} & x^{(3)} & \dots & x^{(m)} \end{bmatrix}\] \[Y = \begin{bmatrix} y^{(1)} & y^{(2)} & y^{(3)} & \dots & y^{(m)} \end{bmatrix}\]

2.Basic methods

$\hat{y} = \sigma(w^{T}x + b)$, where $\sigma(z) = \frac{1}{1 + e^{-z}}$

Given ${(x^{(1)}, y^{(1)}), \dots, (x^{(m)}, y^{(m)})}$, want $\hat{y}^{(i)} \approx y^{(i)}$

Loss (error) function: \(L(\hat{y}, y) = -[y\log{\hat{y}} + (1 - y)\log{(1 - \hat{y})}]\)

Cost function: \(J(w,b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)})\)

Gradient descent: \(w := w - \alpha \frac{\partial J(w,b)}{\partial w}\)

$\alpha = \text{learning rate}$