brainstorm Truncated BPTT

Truncated BPTT

Open jramapuram opened this issue 9 years ago • 4 comments

Is it possible to do truncated BPTT currently? I have a really long time series: 1411889 samples This overflows when trying to train on any backend.

Oct 28 '15 10:10 jramapuram

We don't have specific support for truncated BPTT currently. What you can do is to chunk up your sequence and just treat them as separate sequences. That will loose the internal state between chunks but at least allow you to train. But carrying the internal state precisely for usecase is on our agenda (see #57).

Oct 28 '15 12:10 Qwlouse

Yea, I had considered chunking, however as you mentioned the cross-sequence context is lost. I.e. in turn we prevent learning truly 'long-term' dependencies. Looking forward to your solution to #57

Oct 28 '15 12:10 jramapuram

Any news with this?

Mar 26 '16 17:03 jramapuram

We realized that this was not needed for our current experiments, and so we wouldn't be able to properly test it etc. We didn't finalize how this should be cleanly integrated with everything else, but the lower-level stuff necessary to get and restore context is in place so it should be possible to write a custom SgdStepper which restores context across forward passes with the help of network.get_context()

Mar 27 '16 17:03 flukeskywalker

brainstorm brainstorm copied to clipboard

Truncated BPTT

brainstorm
brainstorm copied to clipboard