gym [Question] The state in Blackjack-v1

Question

Hi,

I have a question on the state of Blackjack-v1. In Blackjack-v1:

env.observation_space = Tuple(Discrete(32), Discrete(11), Discrete(2))
However, I got a state (**47**, 8, False). The player’s current sum is 47. I think this state is not impossible.

Can you give some suggestions?

Oct 04 '22 02:10 liuqi8827

That seems like unexpected behavior to me, if you can find a seed that produces that behavior, that would be really useful for debugging.

Oct 04 '22 05:10 balisujohn

I figured it out, I will make a PR to fix this, it's actually kind of subtle.

Oct 04 '22 19:10 balisujohn

@balisujohn Thanks for your quik reply! The env seed that I set is 0.

Oct 05 '22 01:10 liuqi8827

lol, I started a seed sweep with an assert to check for out of bounds and it stopped on zero, I thought it was a bug(in my test code) at first.

Oct 05 '22 02:10 balisujohn

Try running this code snippet and see if you get an error:

import gym


env = gym.make("Blackjack-v1")

#works correctly

obs,info = env.reset(seed = 0)
done = False
while not done:
    action = 1
    obs, reward, done, truncated, info = env.step(action)
    print(obs)
    assert obs[0] < 32, obs

It seems to me like the issue might be calling step after the environment has returned done=True, which is undefined behavior.

Oct 05 '22 18:10 balisujohn

@balisujohn Thanks for your solusion.

I run the following code, it worked correctly.

import gym

env = gym.make("Blackjack-v1")

#works correctly

# obs,info = env.reset(seed = 0)
env.seed(0)
obs = env.reset()
done = False
while not done:
    action = 1
    # obs, reward, done, truncated, info = env.step(action)
    obs, reward, done, info = env.step(action)
    print(obs)
    assert obs[0] < 32, obs

I have three questions:

As you can see, I changed the code obs,info = env.reset(seed = 0) to env.seed(0) and obs = env.reset(). If I did not do the change, I got an error:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    obs,info = env.reset(seed = 0)
  File "/home/xxx/anaconda3/envs/py3.6-xxx/lib/python3.6/site-packages/gym/wrappers/order_enforcing.py", line 16, in reset
    return self.env.reset(**kwargs)
TypeError: reset() got an unexpected keyword argument 'seed'

As you can see, I changed the code obs, reward, done, truncated, info = env.step(action) to obs, reward, done, info = env.step(action). If I did not do the change, I got an error:

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    obs, reward, done, truncated, info = env.step(action)
ValueError: not enough values to unpack (expected 5, got 4)

I run the code correctly. However, I found out that the agent only excutes one step, then the done comes to True I do not know whether the priginal error occurs, if the excution step becomes longer.

Thus, I'm curious about these two lines code. Why can you run it successfully, but I cannot run it successfully. And I'm also curious about the third question.

Oct 06 '22 01:10 liuqi8827

I think I was mistaken earlier about there being a logic error. Did your original code which produced the incorrect observation call step without checking done? Assuming that was the case, I think likely there is no error in the gym code.

You are using a version of gym older than 0.26.0, prior to our API changes. So the two lines you had to change are different between your version and 0.26.0 and later.

As for your third question, that's interesting, what's your gym version?

Oct 06 '22 02:10 balisujohn

Hey, we just launched gymnasium, a fork of Gym by the maintainers of Gym for the past 18 months where all maintenance and improvements will happen moving forward. Could you please move this over to the new repo?

If you'd like to read more about the story behind the backstory behind this and our plans going forward, click here.

Oct 25 '22 17:10 pseudo-rnd-thoughts

gym gym copied to clipboard

[Question] The state in Blackjack-v1

Question

gym
gym copied to clipboard