Klaus Greff
Klaus Greff
The trainer description not enough, because it (currently) discards "fleeting" information like `current_epoch_nr`, `current_update_nr`. Also steppers might have some internal state (like the velocity in `MomentumStep`), which would not be...
Thank you for our first feature PR! As mentioned in #36 we agree on somehow integrating that possibility. However passing a print_function separately to every hook seems inconvenient and adds...
We don't have specific support for truncated BPTT currently. What you can do is to chunk up your sequence and just treat them as separate sequences. That will loose the...
Streams might be helpful, but I'm not sure we can get around the threading with streams here, since we need to also run next on the iterator while the forward...
Double buffering is currently broken, because it overwrites the input data while the forward/backward pass is running. This is clearly a problem, because we might still need the old values....
How about we (ab)use indexing notation for that: ``` python _h[1].dot_add_mm(dIa[t], x[t], dWi, transa=True) _h[2].dot_add_mm(dFa[t], x[t], dWf, transa=True) _h[3].dot_add_mm(dOa[t], x[t], dWo, transa=True) _h[4].dot_add_mm(dZa[t], x[t], dWz, transa=True) ``` If `_h[0]` returns...
Ok, that's a fair point. What I don't like about `_h.set_stream(4).dot_add_mm(...)` is that it actually sets the stream, i.e. changes the state of the handler. So all of these would...
Option 4: ``` python with _h.streams(1): _h.dot_add_mm(flat_dH, W, out=flat_in_delta_buffer) with _h.streams(2): _h.dot_mm(flat_dH, flat_input, out=dW, transa=True) _h.sum_t(flat_dH, axis=0, out=dbias) _h.sum_t(flat_dH, axis=0, out=dbias) # runs on default stream ``` Considering issue 2a...
I think this should be post-release. It is important so it shouldn't be rushed. Let's set up a benchmarking suite first, and do a little bit of profiling. WRT Option3...
Yes! Let's definitely not tackle that before the release.