Jiayi Weng comments

Results 303 comments of


                                            Jiayi Weng

Improve discrete control offline RL benchmark

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

Improve discrete control offline RL benchmark

Not sure what happens, could you please send me the code?

Improve discrete control offline RL benchmark

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

Improve discrete control offline RL benchmark

I think the reason behind is that, when we developed the RBM, we assume the buffers in input buffer list are all uninitialized: ```python [ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(),...

AsyncCollector seems not to collect asynchronous

Hmm the current code does what your said. However, the best way is to deal with reset/step together. If the underlying env has auto-reset env wrapper, i.e., `step(*) == reset()`...

Implementation design issues in SubprocVectorEnv

Hello my old friend! Nice to hear from that great news! I change this to support EnvPool async mode little by little. in EnvPool, both the `step` and `reset` are...

Implement MBPO (#16) and REDQ

Sorry for the delay. Could you please kindly answer the following questions because I'm not quite familiar with these two algorithms and need some context? 1. What's the difference between...

Implement MBPO (#16) and REDQ

> A solution to optimize the sample method in my mind is to figure out or keep available indices of _meta in an array and sample with a single np.random.choice...

What paper or reference is the RNN implementation trying to replicate?

There're known issues pointing out at #486. Unfortunately, I have no time to fix it until two months later (after graduation)...Feel free to submit PRs and I'm really sorry about...

Episode start signal not used in RNN for on-policy algorithms

Thanks! I'll take a look when I finish database assignment ......