Gym.jl icon indicating copy to clipboard operation
Gym.jl copied to clipboard

Standard action space for DiscreteSpace

Open darsnack opened this issue 6 years ago • 4 comments

Currently, the DiscreteSpace is defined as {1, ..., n} (as it should be), but the lines in CartPole.jl that map {1, 2} --> {-1, 1} are commented out. Additionally, the assertion is commented out. Is there a reason for this? Someone has already written the code to transfer the step! logic to a {1, ..., n} action space, so why aren't we using it?

If there is a reason, can we settle what the standard action space should be?

darsnack avatar Apr 23 '19 21:04 darsnack

Hi @darsnack ! If we have a discrete space then the environment is not differentiable. Because in discrete space, we extract the index and pass it to step!. Mapping {1, 2} --> {-1, 1} is just a hack we found for CartPole's action space, to turn it into a continuous one. But in long term, we would want to be able to use a Discrete space still keep it differentiable, or a hack to map {1, ..., n} --> some continuous space would also be helpful.

tejank10 avatar Apr 28 '19 05:04 tejank10

I think logically, a discrete to continuous mapping would be {1, ..., n} --> [1.0, n]. Beyond that, I think it is unique to each environment. For example, in CartPole, we would have the standard mapping {1, 2} --> [1.0, 2.0], then CartPole would calculate force = 2f0 * (continuous_action - 1f0) - 1f0. Is this along the lines you are thinking?

darsnack avatar Apr 28 '19 18:04 darsnack

Yeah right, it is dependent on environment. Ideally, I would like to keep an environment's discrete action space as it is and introduce a black box between model and step! that would take the gradient and pass it through the index from where the action value came. The hack which you provided should also work. Continuous action space runs from -inf to inf. Negative and positive values are equally likely. Because of this it is suitable for Discrete space of size 2 to map to it. By mapping {1, 2} --> [1.0, 2.0], I assume we would shift origin to 1.5 such that anything below it is rounded to 1 and above it to 2.

tejank10 avatar Apr 29 '19 11:04 tejank10

Been thinking about this recently. Should we establish an experimental zygote branch that uses custom adjoints to implement differentiable DiscreteSpaces?

darsnack avatar May 21 '19 02:05 darsnack