Yegor Tkachenko
Yegor Tkachenko
Will start working on this after we have finished tutorials + got the package published
I think it's best if we have one general sampling scheme without special cases - no history can be the default, i.e. (1,0,0).
Yep, I think dynamic visualization of the (1) loss, (2) reward per episode, (3) Q-values avg. would be awesome. Any ideas what would be the best way to implement that?
Ah, got it. In my experience, evaluation part passes too fast, so visualization appears for a very short time - training might take a pretty long time, and seeing the...