gold icon indicating copy to clipboard operation
gold copied to clipboard

Environment Step: float or int?

Open dtitov opened this issue 3 years ago • 6 comments

Hi,

I'm trying to create my own agent, following the example here: https://github.com/aunum/gold/tree/master/pkg/v1/agent/reinforce It's an example for CartPole-v0, while I'm playing around with LunarLander-v2.

I've come now to implementing the "step" function. And I realized that Env.Step() function takes int as an argument. Maybe that's what is needed for the CartPole-v0, but LunarLander-v2's step is float: https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py#L111

# Action is two floats [main engine, left-right engines].
# Main engine: -1..0 off, 0..+1 throttle from 50% to 100% power. Engine can't work with less than 50% power.
# Left-right:  -1.0..-0.5 fire left engine, +0.5..+1.0 fire right engine, -0.5..0.5 off

Pretty much everywhere in the gold library where I meet anything action-related - it's int.

How should it be mapped to the continuous environments? Or is it not possible at the moment? If so, would it be difficult to implement it?

dtitov avatar Aug 12 '21 17:08 dtitov

Hm, my bad. I've just re-read the action definition here: it's given as intervals, e.g. -1..0 off, which means that integers can also be used in this case. I still wonder though if there are other environments where the action actually has to be a real value... 🤔

dtitov avatar Aug 12 '21 17:08 dtitov

Hey @dtitov, yeah I believe actions are currently only discreet, and I had planned on supporting continuous in the future. If you're up for it, feel free to add, although I think some of this would be simpler once generics are available.

pbarker avatar Aug 12 '21 18:08 pbarker

I think some of this would be simpler once generics are available.

Very likely :) Let's wait for it then, thanks.

dtitov avatar Aug 12 '21 18:08 dtitov

Hm, I now stumbled across a different issue: arity of the action/step. In the current implementation, action is one number (doesn't matter whether it's int or float). But for the lunar lander, action is a vector of two numbers. How is that supposed to be handled? 🤔

dtitov avatar Aug 12 '21 18:08 dtitov

Ah yeah I had only tried out a couple environments with the agents so far and hadn't progressed to lunar lander yet, the action out of the agent probably needs to be a tensor. That would need a change, but I'm pretty swamped with the day job and a toddler right now 🙂

pbarker avatar Aug 12 '21 18:08 pbarker

Right, I see. Thanks anyway.

dtitov avatar Aug 12 '21 19:08 dtitov