Yasuhiro Fujita comments

Results 90 comments of


                                            Yasuhiro Fujita

MultiDiscrete action spaces

> I'm facing the problem of having a multi-discrete action space with different number of actions along each dimension. For example, dimension 1 has 5 actions, dimension 2 has 3...

A perhaps easier but less clean workaround is to model it as a joint distribution of same-sized categorical distributions using `Independent` and `Categorical` but set logits for unused categories to...

Discrepancy in SAC on entropy coefficient update

You are right, it seems to be a discrepancy from the official implementation. I do not remember whether I made a comparison, maybe not.

RuntimeError: Size does not match at dimension 0 expected index [32, 1] to be smaller than self [1, 3] apart from dimension 1

I guess from the error message it is due to the shape issue of your network's output. ``` File "/Users/behradkoohy/sumo-scratchpad/RESCO/venv/lib/python3.8/site-packages/pfrl/action_value.py", line 70, in evaluate_actions return self.q_values.gather(dim=1, index=index).flatten() RuntimeError: Size does...

Reproducing the scores reported by the IQN paper

Thank you. The plot I pasted above is before that commit. I'll try again with the current master branch. What about my first question on where these values come from?...

Reproducing the scores reported by the IQN paper

I mean `N` and `N'` in (3), not `n` in (4). `ImplicitQuantileAgent.num_tau_samples` and `ImplicitQuantileAgent.num_tau_prime_samples` correspond to `N` and `N'`, respectively, correct?

Reproducing the scores reported by the IQN paper

I see. Thank you for clarifying it.

Reproducing the scores reported by the IQN paper

Does dopamine follow the up-to-30-noop evaluation protocol used in the paper? I cannot find code that sends noop actions after reset.

Reproducing the scores reported by the IQN paper

Because sticky actions are disabled by the config file and up-to-30-noop is not implemented, I suspect that the current evaluation protocol of `implicit_quantile_icml.gin` is more deterministic and thus easier than...

Reproducing the scores reported by the IQN paper

I ran `implicit_quantile_icml.gin` again with the fix. The results are much better now, but on Breakout and Seaquest it didn't reached the paper scores yet. Any ideas?