Neural Network Representation

I. A Shallow Neural Network Representation

Let’s use a shallow neural network for an education purpose. Shallow neural networks are neural networks with only one hidden layer. Therefore, there are only 3 layers in this neural network: an input layer, a hidden layer, and finally an output layer.

Let’s denote XX as our input matrix that has 3 features. Therefore,

X=[x1x2x3]X = \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}

Whereas, yy is our label, which is a scalar.

From the given information, we define the following shallow neural network:. Shallow Neural Network

You may notice some notations in the network diagram, but do not worry as we will explain these notations right now.

The i-th layer is commonly represented as a[i]a^{[i]}, where aa stands for activation and ii indicates the i-th layer. Although there is no activation function applies on the input layer, it is sometimes referred as the a[0]a^{[0]} layer, where as the i-th count starts at 1 from the first hidden layer to the output layer. Moreover, a[i]a^{[i]} is a vector.

a[i]=[a1[i]an[i]]a^{[i]} = \begin{bmatrix} a^{[i]}_1 \\ \vdots \\ a^{[i]}_n \end{bmatrix}

The jthj-th element on the ithi-th layer is denoted as aj[i]a^{[i]}_{j}. For example, the first neuron in the first hidden layer is referred to as a1[1]a^{[1]}_1.

Let’s talk weights and bias. The weight matrix and the bias of the i-th layer are denoted as W[i]W^{[i]} and b[i]b^{[i]}, respectively. And the shape of these matrices are the following:

W[i]=(ni,ni1)W^{[i]} = (n_i, n_{i-1})

and

b[i]=(ni,1)b^{[i]} = (n_i, 1)

where nin_i is the number of neurons in the i-th layer.

For example, in the our network, W[1]=(4,3)W^{[1]} = (4,3), whereas b[1]=(4,1)b^{[1]} = (4,1) since there are 4 neurons in the first hidden layer and there are 3 features in the input layer.

II. Computing a Neural Network Output

1. Single training example

Let’s do a forward pass on our Neural Network. The value of the first neuron in the first hidden layer is:

a1[1]=σ(W1[1]Tx+b1[1])a^{[1]}_1 = \sigma(W^{[1]T}_1x + b^{[1]}_1)

This applies a sigmoid activation function to the linear equation of the inputs. Similarly, we can compute the remaining neurons:

a1[1]=σ(W1[1]Tx+b1[1])a2[1]=σ(W2[1]Tx+b2[1])a3[1]=σ(W3[1]Tx+b3[1])a4[1]=σ(W4[1]Tx+b4[1])a^{[1]}_1 = \sigma(W^{[1]T}_1x + b^{[1]}_1) \\ a^{[1]}_2 = \sigma(W^{[1]T}_2x + b^{[1]}_2) \\ a^{[1]}_3 = \sigma(W^{[1]T}_3x + b^{[1]}_3) \\ a^{[1]}_4 = \sigma(W^{[1]T}_4x + b^{[1]}_4) \\

We can vectorize these equations as the following:

[W1[1]TW2[1]TW3[1]TW4[1]T]x+[b1[1]b2[1]b3[1]b4[1]]=W[1]x+b[1]\begin{bmatrix} W^{[1]T}_1 \\ W^{[1]T}_2 \\ W^{[1]T}_3 \\ W^{[1]T}_4 \\ \end{bmatrix} x + \begin{bmatrix} b^{[1]}_1 \\ b^{[1]}_2 \\ b^{[1]}_3 \\ b^{[1]}_4 \\ \end{bmatrix} = W^{[1]}x + b^{[1]}

where x=[x1x2x3]x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}

2. Multiple training examples

Let X=[x(1)x(2)x(n)]X = \begin{bmatrix} x^{(1)} & x^{(2)} & \dots & x^{(n)} \end{bmatrix}

Then, by using matrix multiplication we can calculate the hidden layer’s matrix:

A[1]=W[1]X+b[1]A^{[1]} = W^{[1]}X + b^{[1]}

where

  • X:(nfeatures,m)X: (n_{\text{features}}, m)
  • W:(ni,ni1)W: (n_i, n_{i-1})
  • b:(ni,m)b: (n_i, m)