nntrainer Support Dynamic Time Sequence

Currently, NNtrainer uses the addWithReferenceLayers(...) to unroll the subGraph. The maximum time of unroll is defined using unroll_for property and every memory space including tensors is allocated accordingly. However, the time sequence of the input might vary and if it is shorter than the maximum number of unroll, we don't need to run the rest of the time sequence. It helps to save the computation and improves the accuracy of detecting the end of the sentence.

In order to support, we might have two options. Lets' consider below, issue1931_1

Option I : Letting Network Object control everything

issue1931_2

As in the picture, we could make the recurrent part of network layers as an sub graph and NeuralNetwork Object run. In this case, It might be easy to enable the Dynamic Time Sequencing which stops according to the input time length. But implementing this way might need many efforts. It could refactor the whole current implementation.

It's because NNTrainer unrolls the network as in below,

issue1931_3

Option II : Letting Network Object control Stop and End

issue1931_4

Another way is that define the Start and End Layer in LayerNode and whenever NeuralNetowrk Model Object runs the layer, check if the layer is the start layer (for forwarding). If it is the start layer then counts how many start layer is computed.. (that means how many time sequence is conducted) and compare the time sequence of the input tensor. if it matches, skips compute until End Layer appears. ( Mark the Return Layer. We need this for the backward propagation )

Backward propagation needs to be done by the opposite way.

Jun 15 '22 07:06 jijoongmoon

:octocat: cibot: Thank you for posting issue #1933. The person in charge will reply soon.

Jun 15 '22 07:06 taos-ci

TODO List

[x] set the dynamic_time_sequence=true Property for the addWithReferenceLayers : #1935
[ ] set the Start and End Layer in Recurrent Realizer
[ ] Implement Counting the Time Sequence during Forwarding
- [ ] Skip layer->forwarding if the number of time sequence is greater than input time sequence until End Layer. (if number of time sequence is equal to input time sequence, set return layer )
- [ ] If End Layer, check the memory buffer. Do forwarding
[ ] Implement Backwarding
- [ ] Backwarding Start, if End Layer, compute backwarding and skip until the layer is return layer.
- [ ] If return layer, then check the memory buffer ( Might need the memory copy ) and do the computation.

Jun 15 '22 08:06 jijoongmoon

For the backward operation, How can we know that it's skip layer? For example, in diagram 'Option-II', Layer 3/2 should be skiped, but it seems that we can know it needs to be skipped after investigating to the Layer 1/2. Or, is there other way to memorize final direction from end layer ? (It may maintains final layer for Model Obj Runner)

Jun 16 '22 07:06 jihochu

For the backward operation, How can we know that it's skip layer? For example, in diagram 'Option-II', Layer 3/2 should be skiped, but it seems that we can know it needs to be skipped after investigating to the Layer 1/2. Or, is there other way to memorize final direction from end layer ? (It may maintains final layer for Model Obj Runner)

The procedure I'm thinking is,

forwarding

Counting the presence of Start Layer.
If the Start Layer Counting == Input Height ( which is supposed to be the time step )
- Mark Return Layer ( In the Option II, It is Layer 3/1 )
- skip layer->forwarding until the layer is end layer.
- Check the memory buffer ( Might needs the copy )
- Do end layer -> forwarding
- Do layer->forwarding

backward propagation

Do layer->backwarding until the layer is the end layer.
Skip the layer->backwarding until the layer is Return Layer
Checking the Memory Buffer ( Might Needs the copy the data )
Do layer -> backwarding from return layer

Jun 17 '22 06:06 jijoongmoon

The procedure I'm thinking is,

forwarding

Counting the presence of Start Layer.

If the Start Layer Counting == Input Height ( which is supposed to be the time step )

Mark Return Layer ( In the Option II, It is Layer 3/1 )

skip layer->forwarding until the layer is end layer.

Check the memory buffer ( Might needs the copy )

Do end layer -> forwarding

Do layer->forwarding

backward propagation

Do layer->backwarding until the layer is the end layer.

Skip the layer->backwarding until the layer is Return Layer

Checking the Memory Buffer ( Might Needs the copy the data )

Do layer -> backwarding from return layer

Nice algorithm :+1:

I thought that jumping directly to return node can have advantage in memory usage, but it's impossible to alloc/dealloc memory dynamically with variable-length training input. (It was very useful off-line discussion, thanks.)

Jun 17 '22 08:06 jihochu

What I've understood is so in this way we need 3 properties.(start for counting, return/end for mark return/end layer) How about counting the time sequence in return layer? In this way we can integrate start layer and return layer.

Jun 20 '22 11:06 lhs8928

I think we need to consider multiple input/output scenario in further case. We don't know which layer will be runs first or last so it might be hard to decide the return layer.(For now we can't assure the ordering result in multiple input/output cases)

Jun 20 '22 11:06 lhs8928

nntrainer nntrainer copied to clipboard

Support Dynamic Time Sequence

nntrainer
nntrainer copied to clipboard