gym icon indicating copy to clipboard operation
gym copied to clipboard

[Question] The state in Blackjack-v1

Open liuqi8827 opened this issue 2 years ago • 8 comments

Question

Hi,

I have a question on the state of Blackjack-v1. In Blackjack-v1:

  1. env.observation_space = Tuple(Discrete(32), Discrete(11), Discrete(2))
  2. However, I got a state (**47**, 8, False). The player’s current sum is 47. I think this state is not impossible.

Can you give some suggestions?

liuqi8827 avatar Oct 04 '22 02:10 liuqi8827

That seems like unexpected behavior to me, if you can find a seed that produces that behavior, that would be really useful for debugging.

balisujohn avatar Oct 04 '22 05:10 balisujohn

I figured it out, I will make a PR to fix this, it's actually kind of subtle.

balisujohn avatar Oct 04 '22 19:10 balisujohn

@balisujohn Thanks for your quik reply! The env seed that I set is 0.

liuqi8827 avatar Oct 05 '22 01:10 liuqi8827

lol, I started a seed sweep with an assert to check for out of bounds and it stopped on zero, I thought it was a bug(in my test code) at first.

balisujohn avatar Oct 05 '22 02:10 balisujohn

Try running this code snippet and see if you get an error:

import gym


env = gym.make("Blackjack-v1")

#works correctly

obs,info = env.reset(seed = 0)
done = False
while not done:
    action = 1
    obs, reward, done, truncated, info = env.step(action)
    print(obs)
    assert obs[0] < 32, obs

It seems to me like the issue might be calling step after the environment has returned done=True, which is undefined behavior.

balisujohn avatar Oct 05 '22 18:10 balisujohn

@balisujohn Thanks for your solusion.

I run the following code, it worked correctly.

import gym

env = gym.make("Blackjack-v1")

#works correctly

# obs,info = env.reset(seed = 0)
env.seed(0)
obs = env.reset()
done = False
while not done:
    action = 1
    # obs, reward, done, truncated, info = env.step(action)
    obs, reward, done, info = env.step(action)
    print(obs)
    assert obs[0] < 32, obs

I have three questions:

  1. As you can see, I changed the code obs,info = env.reset(seed = 0) to env.seed(0) and obs = env.reset(). If I did not do the change, I got an error:
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    obs,info = env.reset(seed = 0)
  File "/home/xxx/anaconda3/envs/py3.6-xxx/lib/python3.6/site-packages/gym/wrappers/order_enforcing.py", line 16, in reset
    return self.env.reset(**kwargs)
TypeError: reset() got an unexpected keyword argument 'seed'
  1. As you can see, I changed the code obs, reward, done, truncated, info = env.step(action) to obs, reward, done, info = env.step(action). If I did not do the change, I got an error:
Traceback (most recent call last):
  File "test.py", line 16, in <module>
    obs, reward, done, truncated, info = env.step(action)
ValueError: not enough values to unpack (expected 5, got 4)
  1. I run the code correctly. However, I found out that the agent only excutes one step, then the done comes to True I do not know whether the priginal error occurs, if the excution step becomes longer.

Thus, I'm curious about these two lines code. Why can you run it successfully, but I cannot run it successfully. And I'm also curious about the third question.

liuqi8827 avatar Oct 06 '22 01:10 liuqi8827

I think I was mistaken earlier about there being a logic error. Did your original code which produced the incorrect observation call step without checking done? Assuming that was the case, I think likely there is no error in the gym code.

You are using a version of gym older than 0.26.0, prior to our API changes. So the two lines you had to change are different between your version and 0.26.0 and later.

As for your third question, that's interesting, what's your gym version?

balisujohn avatar Oct 06 '22 02:10 balisujohn

Hey, we just launched gymnasium, a fork of Gym by the maintainers of Gym for the past 18 months where all maintenance and improvements will happen moving forward. Could you please move this over to the new repo?

If you'd like to read more about the story behind the backstory behind this and our plans going forward, click here.

pseudo-rnd-thoughts avatar Oct 25 '22 17:10 pseudo-rnd-thoughts