Klaus Greff comments

Results 130 comments of


                                            Klaus Greff

Describing and serializing objects

The trainer description not enough, because it (currently) discards "fleeting" information like `current_epoch_nr`, `current_update_nr`. Also steppers might have some internal state (like the velocity in `MomentumStep`), which would not be...

Allow usage of any function as log printer

Thank you for our first feature PR! As mentioned in #36 we agree on somehow integrating that possibility. However passing a print_function separately to every hook seems inconvenient and adds...

Truncated BPTT

We don't have specific support for truncated BPTT currently. What you can do is to chunk up your sequence and just treat them as separate sequences. That will loose the...

double buffering

Streams might be helpful, but I'm not sure we can get around the threading with streams here, since we need to also run next on the iterator while the forward...

double buffering

Double buffering is currently broken, because it overwrites the input data while the forward/backward pass is running. This is clearly a problem, because we might still need the old values....

Streams

How about we (ab)use indexing notation for that: ``` python _h[1].dot_add_mm(dIa[t], x[t], dWi, transa=True) _h[2].dot_add_mm(dFa[t], x[t], dWf, transa=True) _h[3].dot_add_mm(dOa[t], x[t], dWo, transa=True) _h[4].dot_add_mm(dZa[t], x[t], dWz, transa=True) ``` If `_h[0]` returns...

Streams

Ok, that's a fair point. What I don't like about `_h.set_stream(4).dot_add_mm(...)` is that it actually sets the stream, i.e. changes the state of the handler. So all of these would...

Streams

Option 4: ``` python with _h.streams(1): _h.dot_add_mm(flat_dH, W, out=flat_in_delta_buffer) with _h.streams(2): _h.dot_mm(flat_dH, flat_input, out=dW, transa=True) _h.sum_t(flat_dH, axis=0, out=dbias) _h.sum_t(flat_dH, axis=0, out=dbias) # runs on default stream ``` Considering issue 2a...

Streams

I think this should be post-release. It is important so it shouldn't be rushed. Let's set up a benchmarking suite first, and do a little bit of profiling. WRT Option3...

All buffers have to be of the same dtype

Yes! Let's definitely not tackle that before the release.