Recurrent Neural Networks (Colab code Included).

11 min readMay 16, 2024

In the world of artificial intelligence, few innovations have been as transformative as Recurrent Neural Networks (RNNs). These specialized types of neural networks are designed to recognize patterns in sequences of data, making them invaluable for tasks ranging from language translation to time series prediction. In this blog, we’ll explore what RNNs are, how they work, their variants, challenges, and some of their most exciting applications.

What is a Recurrent Neural Network ?
A Recurrent Neural Network (RNN) is a type of artificial neural network where connections between nodes form a directed graph along a temporal sequence. This structure allows RNNs to exhibit temporal dynamic behavior, enabling them to process sequences of data. Unlike traditional neural networks, which assume inputs and outputs are independent of each other, RNNs leverage their internal state (memory) to process sequences of inputs, making them particularly effective for tasks where context or history matters.

Components of RNN :

To fully grasp the inner workings of Recurrent Neural Networks (RNNs), it’s essential to understand their fundamental components. These components work together to allow RNNs to process sequential data effectively. Here, we break down the key components of RNNs:

Input Layer:
The input layer in an RNN takes in the sequential data. Each element in the sequence is fed into the network one step at a time. The data could be anything from time-series data to sentences where each word or character is an input.

Input Vector (𝑥𝑡xt): Represents the input at time step 𝑡t. For instance, in natural language processing, 𝑥𝑡xt could be a one-hot encoded vector representing a word.

2. Hidden Layer:

The hidden layer is the core of an RNN. It consists of a series of neurons that maintain the state of the network and update it based on new inputs. The hidden state acts as the memory of the network.

Hidden State (ℎ𝑡ht): This is the memory of the network at time step 𝑡t. It captures information from previous inputs and updates with each new input. The hidden state is computed as:

ℎ𝑡=𝑓(𝑊ℎ𝑥𝑥𝑡+𝑊ℎℎℎ𝑡−1+𝑏ℎ)ht=f(Whxxt+Whhht−1+bh)

where:

𝑓f : it is a non-linear activation function (like tanh or ReLU).
𝑊ℎ𝑥Whx is the weight matrix for the input to hidden state.
𝑊ℎℎWhh is the weight matrix for the hidden state to hidden state.
𝑏ℎbh is the bias vector.

3. Output Layer:
The output layer generates the output of the network at each time step. This output can be used for predictions or passed on to the next layer or time step.

Output Vector (𝑦𝑡yt): The output at time step 𝑡t, often computed as:

𝑦𝑡=𝑔(𝑊ℎ𝑦ℎ𝑡+𝑏𝑦)yt=g(Whyht+by)

where:

𝑔g is an activation function (like softmax for classification tasks).
𝑊ℎ𝑦Why is the weight matrix for the hidden state to output.
𝑏𝑦by is the bias vector.

4. Weights and Biases

Weights and biases are the parameters of the network that get adjusted during the training process to minimize the error in predictions.

Weight Matrices: These include 𝑊ℎ𝑥Whx, 𝑊ℎℎWhh, and 𝑊ℎ𝑦Why. Each matrix is crucial in transforming the inputs and hidden states.
Bias Vectors: These include 𝑏ℎbh and 𝑏𝑦by, which are added to the hidden and output calculations to help the network learn more effectively.

5. Activation Functions :

Activation functions introduce non-linearity into the network, enabling it to learn complex patterns. Common activation functions in RNNs include:

Tanh: Often used in the hidden layer to maintain values between -1 and 1.
ReLU (Rectified Linear Unit): Can also be used but is less common in vanilla RNNs.
Softmax: Typically used in the output layer for classification tasks to convert raw output scores into probabilities.

6. Backpropagation Through Time (BPTT) :

BPTT is the learning algorithm used to train RNNs. It extends the standard backpropagation algorithm to handle the temporal aspect of RNNs.

Gradient Computation: Gradients are computed for each time step and summed up.
Weight Updates: Weights are updated based on the gradients to minimize the loss function.

7. Loss Function :

The loss function measures the error between the predicted output and the actual target. Common loss functions used in RNNs include:

Cross-Entropy Loss: Often used in classification tasks.
Mean Squared Error (MSE): Commonly used in regression tasks.

Diagram of an RNN :

Here’s a detailed diagram illustrating the components of an RNN :

Recurrent Neural Network (RNN) stands out for its unique ability to handle sequential data by processing it in a sequential manner. At the heart of its operation lies a remarkable feature: the sharing of parameters across all time steps. This mechanism ensures consistency and efficiency in the network’s computations, allowing it to effectively capture temporal dependencies and extract meaningful patterns from sequential data.

Imagine feeding an input vector, denoted as X, into the RNN. As the data is scanned from left to right, the network dynamically updates its hidden state at each time step while concurrently generating an output vector, represented by y. Crucially, the same set of parameters, comprising U, V, and W, governs these transformations throughout the entire process.

Let’s delve into the significance of these parameters. Firstly, U dictates the weight associated with the connection between the input layer, X, and the hidden layer, h. This weight parameter determines how information from the input is processed and integrated into the evolving hidden state of the network. Meanwhile, W plays a pivotal role in managing the connections within the hidden layers themselves. This parameter regulates the flow of information and interactions between successive hidden states, facilitating the propagation of contextual information across time steps.

Lastly, V governs the connection from the hidden layer, h, to the output layer, y. This weight parameter influences how the information encoded in the hidden state is transformed into the final output of the network. By adjusting the weights associated with these connections, the RNN learns to extract relevant features from the input sequence and produce meaningful predictions or classifications.

By sharing these parameters across all time steps, the RNN leverages the wealth of information contained in sequential data more effectively. It encodes the history of past inputs within its current hidden state, enabling it to retain context and capture long-range dependencies. This capacity is particularly valuable in tasks such as natural language processing, speech recognition, and time series prediction, where understanding temporal relationships is paramount.

In essence, the parameter sharing mechanism of the RNN serves as a powerful tool for processing sequential data. It empowers the network to learn from the past, adapt to changing contexts, and make informed predictions or decisions based on the rich temporal dynamics inherent in the data.

At each time step t, the hidden state aₜ is computed based on the current input xₜ , previous hidden state aₜ₋₁ and model parameters as illustrated by the following formula:

aₜ = f(aₜ₋₁, xₜ; θ) — — — — — (1)

It can also be written as,

aₜ = f(U * Xₜ + W* aₜ₋₁ + b)
where,

aₜ represents the output generated from the hidden layer at time step t .
xₜ is the input at time step t.
θ represents a set of learnable parameters(weights and biases).
U is the weight matrix governing the connections from the input to the hidden layer; U ∈ θ
W is the weight matrix governing the connections from the hidden layer to itself (recurrent connections); W∈ θ
V represents the weight associated with connection between hidden layer and output layer; V∈ θ
aₜ₋₁ is the output from hidden layer at time t-1.
b is the bias vector for the hidden layer; b ∈ θ
f is the activation function.

For a finite number of time steps T=4, we can expand the computation graph of a Recurrent Neural Network, illustrated in Figure 3, by applying the equation (1) T-1 times.

a₄ = f(a₃, x₄; θ) — — — — — (2)

Equation (2) can be expanded as,

a₄ = f(U * X₄ + W* a₃ + b)

a₃ = f(U * X₃ + W* a₂ + b)

a₂ = f(U * X₂ + W* a₁ + b)

The output at each time step t, denoted as yₜ is computed based on the hidden state output aₜ using the following formula,

ŷₜ = f(aₜ; θ) — — — — — (3)

Equation (3) can be written as,

ŷₜ = f(V * aₜ + c)

when t=4, ŷ₄ = f(V * a₄ + c)

where,

ŷₜ is the output predicted at time step t.
V is the weight matrix governing the connections from the hidden layer to the output layer.
c is the bias vector for the output layer.

Below is a practical implementation of python which I did in kaggle :

TESLA Stock Price Prediction using Recurrent Neural Networks (RNN)

Link for my work :
https://www.kaggle.com/code/beeru999/rnn-analyzing-tesla-stock-trends-using-rnn

1. Importing Libraries
First, we import the necessary libraries. These libraries help us with various tasks like algebraic operations, data analysis, and visualization.


import numpy as np # For algebraic operations
import pandas as pd # For data analysis and manipulation
import seaborn as sns # For data visualization
import matplotlib.pyplot as plt # For plotting graphs

2. Data Processing
We load the stock price data for Tesla from a CSV file.


data = pd.read_csv(“/kaggle/input/price-volume-data-for-all-us-stocks-etfs/Stocks/tsla.us.txt”)

3. Exploratory Data Analysis (EDA)
We perform a quick EDA to understand the data.

Displaying the first few rows of the dataset.
Checking the shape of the dataset (number of rows and columns).
Describing the dataset to get statistical information.
Checking for any missing values in the dataset.


data.head()
data.shape
data.describe()
data.isna().sum()

4. Splitting the Data into Train-Test Sets :
We split the data into training and testing sets. We use 80% of the data for training and the remaining 20% for testing.


training_size = int(len(data)*0.80)
data_len = len(data)
train = data[0:training_size]
test = data[training_size:data_len]

print("Training Size → ", training_size)
print("total length of data → ", data_len)
print("Train length → ", len(train))
print("Test length → ", len(test))

5. Scaling the Data
We scale the data using Min-Max Scaler to ensure all values are between 0 and 1, which is important for training the neural network.


from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
train = train.loc[:, [“Open”]].values
train_scaled = scaler.fit_transform(train)

6. Preparing the Data for the RNN Model
We create input sequences and corresponding labels for training the RNN. Each input sequence will contain 40 timesteps of data.


X_train = []
y_train = []
timesteps = 40
end_len = len(train_scaled)

for i in range(timesteps, end_len):
 X_train.append(train_scaled[i - timesteps:i, 0])
 y_train.append(train_scaled[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
print("X_train → ", X_train.shape)
print("y_train shape → ", y_train.shape)

7. Building the RNN Model

We use Keras to build a Sequential model with SimpleRNN layers and Dropout for regularization.


from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, Dropout

regressor = Sequential()
regressor.add(SimpleRNN(units=50, activation="tanh", return_sequences=True, input_shape=(X_train.shape[1], 1)))
regressor.add(Dropout(0.2))
regressor.add(SimpleRNN(units=50, activation="tanh", return_sequences=True))
regressor.add(Dropout(0.2))
regressor.add(SimpleRNN(units=50, activation="tanh", return_sequences=True))
regressor.add(Dropout(0.2))
regressor.add(SimpleRNN(units=50))
regressor.add(Dropout(0.2))
regressor.add(Dense(units=1))

8. Compiling the Model
We compile the model using the Adam optimizer and Mean Squared Error (MSE) as the loss function.


regressor.compile(optimizer= “adam”, loss = “mean_squared_error”)

9. Training the Model

We train the model for 100 epochs with a batch size of 20.


epochs = 100 
batch_size = 20
regressor.fit(X_train, y_train, epochs = epochs, batch_size = batch_size)

10. Preparing Test Data
We prepare the test data by scaling and reshaping it similar to the training data.


real_price = test.loc[:, [“Open”]].values
dataset_total = pd.concat((data[“Open”], test[“Open”]), axis=0)
inputs = dataset_total[len(dataset_total) — len(test) — timesteps:].values.reshape(-1, 1)
inputs = scaler.transform(inputs)

X_test = []
for i in range(timesteps, 412):
 X_test.append(inputs[i - timesteps : i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

print("X_test shape → ", X_test.shape)

11. Making Predictions
We use the trained model to make predictions on the test data and then scale the predictions back to the original range.


predict = regressor.predict(X_test)
predict = scaler.inverse_transform(predict)

12. Plotting the Results
Finally, we plot the real stock prices versus the predicted stock prices to visualize the model’s performance.


plt.plot(real_price, color = “orange”, label = “Real Stock Price”)
plt.plot(predict, color = “black”, label = “Predicted Stock Price”)
plt.title(“Stock Price Prediction”)
plt.xlabel(“Time”)
plt.ylabel(“Tesla Stock Price”)
plt.legend()
plt.show()

Conclusion

In this project, we built a Recurrent Neural Network (RNN) to predict Tesla’s stock prices using historical data. We performed data preprocessing, scaled the data, created input sequences, built and trained an RNN model, and visualized the results. The predictions made by the RNN model closely follow the actual stock prices, demonstrating the effectiveness of RNNs for time series prediction tasks.

Summary and Key Points of the RNN Model

In this project, we built a Recurrent Neural Network (RNN) model to predict Tesla’s stock prices using historical data. Here are the key steps and components involved:

Data Loading and Exploration:

The dataset containing Tesla’s stock prices was loaded using pandas.
Basic exploratory data analysis (EDA) was performed to understand the dataset’s structure, summary statistics, and to check for any missing values.

Data Preprocessing:

The dataset was split into training and test sets (80% for training and 20% for testing).
The stock prices were scaled using MinMaxScaler to normalize the data, which is a common preprocessing step for neural networks to improve convergence speed and model performance.
The training data was transformed into sequences of 40 time steps to create the input features (X_train) and the target variable (y_train).

RNN Model Construction:

A Sequential model was created using Keras.
Multiple layers of SimpleRNN were added to the model, each with 50 units and tanh activation function.
Dropout layers were included to prevent overfitting by randomly setting a fraction of input units to 0 at each update during training.
A Dense layer with a single neuron was added as the output layer to predict the stock price.

Model Compilation and Training:

The model was compiled using the Adam optimizer and mean squared error (MSE) as the loss function.
The model was trained for 100 epochs with a batch size of 20.

Test Data Preparation and Prediction:

The test data was prepared similarly to the training data, ensuring the same sequence length and scaling.
Predictions were made on the test set using the trained model.
The predicted values were inverse-transformed back to their original scale for comparison with the actual stock prices.

Visualization:

The actual and predicted stock prices were plotted using matplotlib to visually compare the model's performance.

Key Points of RNN Model:

Sequential Nature: RNNs are designed to handle sequential data, making them suitable for time series prediction tasks like stock price forecasting.
Data Normalization: Scaling the data using techniques like MinMaxScaler is crucial for improving the performance of neural networks.
Time Steps: Creating sequences of historical data points (timesteps) is essential to capture temporal patterns in the data.
Dropout Layers: Adding dropout layers helps prevent overfitting by introducing regularization during training.
Evaluation and Visualization: Comparing the predicted values with actual values using visual plots provides insights into the model’s performance and areas for improvement.

I hope you guys did learn something new or it was a revision for you , Thanks for reading ! Do check out my Kaggle , Github attached below your supports matters a lot .

Github :
https://github.com/Darshan0902
Linkedin :
https://www.linkedin.com/in/darshanprabhu009/
Kaggle :
https://www.kaggle.com/beeru999

Regards ;
Darshan D. Prabhu
(Aao Code Kare)