Yasuhiro Fujita
Yasuhiro Fujita
> I'm facing the problem of having a multi-discrete action space with different number of actions along each dimension. For example, dimension 1 has 5 actions, dimension 2 has 3...
A perhaps easier but less clean workaround is to model it as a joint distribution of same-sized categorical distributions using `Independent` and `Categorical` but set logits for unused categories to...
You are right, it seems to be a discrepancy from the official implementation. I do not remember whether I made a comparison, maybe not.
I guess from the error message it is due to the shape issue of your network's output. ``` File "/Users/behradkoohy/sumo-scratchpad/RESCO/venv/lib/python3.8/site-packages/pfrl/action_value.py", line 70, in evaluate_actions return self.q_values.gather(dim=1, index=index).flatten() RuntimeError: Size does...
Thank you. The plot I pasted above is before that commit. I'll try again with the current master branch. What about my first question on where these values come from?...
I mean `N` and `N'` in (3), not `n` in (4). `ImplicitQuantileAgent.num_tau_samples` and `ImplicitQuantileAgent.num_tau_prime_samples` correspond to `N` and `N'`, respectively, correct?
I see. Thank you for clarifying it.
Does dopamine follow the up-to-30-noop evaluation protocol used in the paper? I cannot find code that sends noop actions after reset.
Because sticky actions are disabled by the config file and up-to-30-noop is not implemented, I suspect that the current evaluation protocol of `implicit_quantile_icml.gin` is more deterministic and thus easier than...
I ran `implicit_quantile_icml.gin` again with the fix. The results are much better now, but on Breakout and Seaquest it didn't reached the paper scores yet. Any ideas?