spacegoing

Results 37 comments of spacegoing

@roubkar Hi guys I run into similar issue. I have different metrics with different steps. I can't simply filtering by duration. I think it would be great if we could...

do we have a workaround for this?

just FYI, one simple reason might be training batch are shuffled by default. setting: `batch.split(batch_size, shuffle=False, merge_last=True)` solves my problem

@MischaPanch Yes. I think I kind of fixed on-policy algo (PPO). It works correctly in my case. I need more bad cases to double-check. AFAK https://github.com/thu-ml/tianshou/issues/486 is the only on-policy...

this is super helpful! I'll let u know once I finished checking.

@MischaPanch hi bro I don't know where to raise this issue properly: may I suggest we collect T+1 steps in collector for the sake of value / gae calculation ,...

@MischaPanch Thanks Michael, that issue is a bug indeed. The T+1 proposal I raised is also related to https://github.com/thu-ml/tianshou/issues/886, remember the discussion we had? I'll do a quick summary here...

@MischaPanch thanks for your reply. I believe this is more of an implementation choice than a mathematical issue: The logprod_\pi_old is computed during rollout, only logprod_\pi_new needs to be computed...

@MischaPanch @Trinkle23897 My concern is will this add engineering complexity for tianshou? I think there is a trade-off between easy for using v.s. flexibility for development. I don't know tianshou's...

@MischaPanch LoL I can't agree more. For some reason, I have to use Rllib and it's a pain in my ass everyday, for more than a year. > Btw, me...