nntrainer
nntrainer copied to clipboard
Support Dynamic Time Sequence
Currently, NNtrainer uses the addWithReferenceLayers(...)
to unroll the subGraph. The maximum time of unroll is defined using unroll_for
property and every memory space including tensors is allocated accordingly.
However, the time sequence of the input might vary and if it is shorter than the maximum number of unroll, we don't need to run the rest of the time sequence. It helps to save the computation and improves the accuracy of detecting the end of the sentence.
In order to support, we might have two options. Lets' consider below,
Option I : Letting Network Object control everything
As in the picture, we could make the recurrent part of network layers as an sub graph and NeuralNetwork Object run. In this case, It might be easy to enable the Dynamic Time Sequencing which stops according to the input time length. But implementing this way might need many efforts. It could refactor the whole current implementation.
It's because NNTrainer unrolls the network as in below,
Option II : Letting Network Object control Stop and End
Another way is that define the Start and End Layer in LayerNode and whenever NeuralNetowrk Model Object runs the layer, check if the layer is the start layer (for forwarding). If it is the start layer then counts how many start layer is computed.. (that means how many time sequence is conducted) and compare the time sequence of the input tensor. if it matches, skips compute until End Layer appears. ( Mark the Return Layer. We need this for the backward propagation )
Backward propagation needs to be done by the opposite way.
:octocat: cibot: Thank you for posting issue #1933. The person in charge will reply soon.
TODO List
- [x] set the
dynamic_time_sequence=true
Property for theaddWithReferenceLayers
: #1935 - [ ] set the
Start
andEnd Layer
in Recurrent Realizer - [ ] Implement Counting the Time Sequence during Forwarding
-
[ ] Skip layer->forwarding if the number of time sequence is greater than input time sequence until End Layer. (if number of time sequence is equal to input time sequence, set return layer )
-
[ ] If End Layer, check the memory buffer. Do forwarding
-
- [ ] Implement Backwarding
- [ ] Backwarding Start, if End Layer, compute backwarding and skip until the layer is return layer.
- [ ] If return layer, then check the memory buffer ( Might need the memory copy ) and do the computation.
For the backward operation, How can we know that it's skip layer? For example, in diagram 'Option-II', Layer 3/2 should be skiped, but it seems that we can know it needs to be skipped after investigating to the Layer 1/2. Or, is there other way to memorize final direction from end layer ? (It may maintains final layer for Model Obj Runner)
For the backward operation, How can we know that it's skip layer? For example, in diagram 'Option-II', Layer 3/2 should be skiped, but it seems that we can know it needs to be skipped after investigating to the Layer 1/2. Or, is there other way to memorize final direction from end layer ? (It may maintains final layer for Model Obj Runner)
The procedure I'm thinking is,
forwarding
- Counting the presence of Start Layer.
- If the Start Layer Counting == Input Height ( which is supposed to be the time step )
- Mark Return Layer ( In the Option II, It is Layer 3/1 )
- skip layer->forwarding until the layer is end layer.
- Check the memory buffer ( Might needs the copy )
- Do end layer -> forwarding
- Do layer->forwarding
backward propagation
- Do layer->backwarding until the layer is the end layer.
- Skip the layer->backwarding until the layer is Return Layer
- Checking the Memory Buffer ( Might Needs the copy the data )
- Do layer -> backwarding from return layer
The procedure I'm thinking is,
forwarding
Counting the presence of Start Layer.
If the Start Layer Counting == Input Height ( which is supposed to be the time step )
- Mark Return Layer ( In the Option II, It is Layer 3/1 )
- skip layer->forwarding until the layer is end layer.
- Check the memory buffer ( Might needs the copy )
- Do end layer -> forwarding
- Do layer->forwarding
backward propagation
- Do layer->backwarding until the layer is the end layer.
- Skip the layer->backwarding until the layer is Return Layer
- Checking the Memory Buffer ( Might Needs the copy the data )
- Do layer -> backwarding from return layer
Nice algorithm :+1:
I thought that jumping directly to return node can have advantage in memory usage, but it's impossible to alloc/dealloc memory dynamically with variable-length training input. (It was very useful off-line discussion, thanks.)
What I've understood is so in this way we need 3 properties.(start for counting, return/end for mark return/end layer) How about counting the time sequence in return layer? In this way we can integrate start layer and return layer.
I think we need to consider multiple input/output scenario in further case. We don't know which layer will be runs first or last so it might be hard to decide the return layer.(For now we can't assure the ordering result in multiple input/output cases)