brainstorm
brainstorm copied to clipboard
Truncated BPTT
Is it possible to do truncated BPTT currently?
I have a really long time series: 1411889
samples
This overflows when trying to train on any backend.
We don't have specific support for truncated BPTT currently. What you can do is to chunk up your sequence and just treat them as separate sequences. That will loose the internal state between chunks but at least allow you to train. But carrying the internal state precisely for usecase is on our agenda (see #57).
Yea, I had considered chunking, however as you mentioned the cross-sequence context is lost. I.e. in turn we prevent learning truly 'long-term' dependencies. Looking forward to your solution to #57
Any news with this?
We realized that this was not needed for our current experiments, and so we wouldn't be able to properly test it etc. We didn't finalize how this should be cleanly integrated with everything else, but the lower-level stuff necessary to get and restore context is in place so it should be possible to write a custom SgdStepper which restores context across forward passes with the help of network.get_context()