Deep learning Guide 8: 深度循环神经网络、双向循环神经网络

type

status

date

slug

password

summary

Deep Recurrent Neural Networks 深度循环神经网络

Up until now, we have focused on defining networks consisting of a sequence input, a single hidden RNN layer, and an output layer. Despite having just one hidden layer between the input at any time step and the corresponding output, there is a sense in which these networks are deep. Inputs from the first time step can influence the outputs at the final time step $T$ (often 100s or 1000s of steps later). These inputs pass through $T$ applications of the recurrent layer before reaching the final output. However, we often also wish to retain the ability to express complex relationships between the inputs at a given time step and the outputs at that same time step. Thus we often construct RNNs that are deep not only in the time direction but also in the input-to-output direction. This is precisely the notion of depth that we have already encountered in our development of MLPs and deep CNNs.

到目前为止，我们只讨论了具有一个单向隐藏层的循环神经网络。其中，隐变量和观测值与具体的函数形式的交互方式是相当随意的。只要交互类型建模具有足够的灵活性，这就不是一个大问题。然而，对一个单层来说，这可能具有相当的挑战性。之前在线性模型中，我们通过添加更多的层来解决这个问题。而在循环神经网络中，我们首先需要确定如何添加更多的层，以及在哪里添加额外的非线性，因此这个问题有点棘手。

事实上，我们可以将多层循环神经网络堆叠在一起，通过对几个简单层的组合，产生了一个灵活的机制。特别是，数据可能与不同层的堆叠有关。例如，我们可能希望保持有关金融市场状况（熊市或牛市）的宏观数据可用，而微观数据只记录较短期的时间动态。

描述了一个具有𝐿个隐藏层的深度循环神经网络，每个隐状态都连续地传递到当前层的下一个时间步和下一层的当前时间步。

Formally, suppose that we have a minibatch input (number of examples ; number of inputs in each example ) at time step . At the same time step, let the hidden state of the hidden layer () be (number of hidden units ) and the output layer variable be (number of outputs: ). Setting , the hidden state of the hidden layer that uses the activation function is calculated as follows:

也就是说，上面公式可以由我们调整的。用门控循环单元或长短期记忆网络的隐状态来代替上方的隐状态进行计算，可以很容易地得到深度门控循环神经网络或深度长短期记忆神经网络。

Bidirectional Recurrent Neural Networks

Formally for any time step $t$, we consider a minibatch input (number of examples ; number of inputs in each example ) and let the hidden layer activation function be . In the bidirectional architecture, the forward and backward hidden states for this time step are and , respectively, where is the number of hidden units. The forward and backward hidden state updates are as follows:

where the weights , and the biases and are all the model parameters.

Next, we concatenate the forward and backward hidden states and to obtain the hidden state for feeding into the output layer. In deep bidirectional RNNs with multiple hidden layers, such information is passed on as input to the next bidirectional layer. Last, the output layer computes the output (number of outputs ):

Here, the weight matrix and the bias are the model parameters of the output layer.

Deep Recurrent Neural Networks 深度循环神经网络

Bidirectional Recurrent Neural Networks

Concise Implementation of Deep Recurrent Neural Networks (by GRU)

Concise Implementation of bidirectional RNNs

tom-ci

交流頻道

加入我們的社群討論分享