gym
gym copied to clipboard
[Question] The state in Blackjack-v1
Question
Hi,
I have a question on the state of Blackjack-v1. In Blackjack-v1:
- env.observation_space = Tuple(Discrete(32), Discrete(11), Discrete(2))
- However, I got a state
(**47**, 8, False)
. The player’s current sum is 47. I think this state is not impossible.
Can you give some suggestions?
That seems like unexpected behavior to me, if you can find a seed that produces that behavior, that would be really useful for debugging.
I figured it out, I will make a PR to fix this, it's actually kind of subtle.
@balisujohn Thanks for your quik reply! The env seed that I set is 0.
lol, I started a seed sweep with an assert to check for out of bounds and it stopped on zero, I thought it was a bug(in my test code) at first.
Try running this code snippet and see if you get an error:
import gym
env = gym.make("Blackjack-v1")
#works correctly
obs,info = env.reset(seed = 0)
done = False
while not done:
action = 1
obs, reward, done, truncated, info = env.step(action)
print(obs)
assert obs[0] < 32, obs
It seems to me like the issue might be calling step after the environment has returned done=True
, which is undefined behavior.
@balisujohn Thanks for your solusion.
I run the following code, it worked correctly.
import gym
env = gym.make("Blackjack-v1")
#works correctly
# obs,info = env.reset(seed = 0)
env.seed(0)
obs = env.reset()
done = False
while not done:
action = 1
# obs, reward, done, truncated, info = env.step(action)
obs, reward, done, info = env.step(action)
print(obs)
assert obs[0] < 32, obs
I have three questions:
- As you can see, I changed the code
obs,info = env.reset(seed = 0)
toenv.seed(0) and obs = env.reset()
. If I did not do the change, I got an error:
Traceback (most recent call last):
File "test.py", line 10, in <module>
obs,info = env.reset(seed = 0)
File "/home/xxx/anaconda3/envs/py3.6-xxx/lib/python3.6/site-packages/gym/wrappers/order_enforcing.py", line 16, in reset
return self.env.reset(**kwargs)
TypeError: reset() got an unexpected keyword argument 'seed'
- As you can see, I changed the code
obs, reward, done, truncated, info = env.step(action)
toobs, reward, done, info = env.step(action)
. If I did not do the change, I got an error:
Traceback (most recent call last):
File "test.py", line 16, in <module>
obs, reward, done, truncated, info = env.step(action)
ValueError: not enough values to unpack (expected 5, got 4)
- I run the code correctly. However, I found out that the agent only excutes one step, then the
done
comes toTrue
I do not know whether the priginal error occurs, if the excution step becomes longer.
Thus, I'm curious about these two lines code. Why can you run it successfully, but I cannot run it successfully. And I'm also curious about the third question.
I think I was mistaken earlier about there being a logic error. Did your original code which produced the incorrect observation call step without checking done? Assuming that was the case, I think likely there is no error in the gym code.
You are using a version of gym older than 0.26.0, prior to our API changes. So the two lines you had to change are different between your version and 0.26.0 and later.
As for your third question, that's interesting, what's your gym version?
Hey, we just launched gymnasium, a fork of Gym by the maintainers of Gym for the past 18 months where all maintenance and improvements will happen moving forward. Could you please move this over to the new repo?
If you'd like to read more about the story behind the backstory behind this and our plans going forward, click here.