Notes - Sequence Model - Deep Learning Specialisation

RNNs share the parameters for each layer. Each layer can output the y vectors.
The backpropagation here is called backpropagation through time because we pass activation a from one sequence element to another like backwards in time.

RNNs do very well in language model problems.
To train the language model, we basically have to learn the conditional probabilities between layers. Thats what you have to train the model for. The loss function used here is cross-entropy loss. Do a summation of y log y for all elements in the corpus and do this for all timesteps.
Character level language model also exists. The lab with Dinasour names. The model we used had the following layers in Optimization loop (there is initialization of parameters at the very beginning).
- Forward propagation to compute the loss function
- Backward propagation to ocmpute the gradient with respect to the loss function
- Clip the gradients, update your parameters with the gradient descent udpate rule.
At each time step RNN tries to predict what is the next character given the preivious characters.

These are used to have a memory. LSTM is more generic version and was invented first. GRU was discovered recently but has been gaining popularity gradually and slowly. Its easier to scale tthe GRUs for different problems. But historically LSTM has been the most proven choice.

has both forward and backward direction. This uses the future parts as well to make the prediction. The blocks used here can also be LSTM! Bidirectional LSTM is common in NLP problems.