gym
gym copied to clipboard
[Question] Question title
Question
I cannot set the state for the vectorized environments:
For non-vectorized environments we can do the following and it will set the state of the environment.
env.state = np.array([0,1,2,3])
For vectorized environments this does not work. no matter we try to set each env individually (env.state) or in batch (env_vectorized.observations) once the .step() is called it uses the original random values generated by the .reset() function.
Is there a way to achieve that behavior, or is a bug or this has simply not been implemented yet.
CREATE ENVS
nn = 3
env_vect = gym.vector.SyncVectorEnv([lambda: gym.make("CartPole-v1").env for _ in range(nn)])
print(0, type(env_vect.observations), env_vect.observations) # --> Returns array of zeros OK
env_vect.reset()
print(1, type(env_vect.observations), env_vect.observations) # --> returns array of random numbers OK
SET EACH ENV INDIVIDUALLY
for env in env_vect.envs: #print("1a", env.state) env.state = np.array([1,2,3,4]) print("1b", env.state) # --> returns values of np.array([1,2,3,4]) OK
print(2, env_vect.observations) # --> returns original random values NOT OK!
for env in env_vect.envs: print("2a", env.state) # --> returns values of np.array([1,2,3,4]) OK
print(env_vect.step([1 for i in range(nn)])) # --> returns new values that are based on the original random values NOT OK!
SET ALL ENVs at the same time
env_vect.observations = np.array([[11,12,13,14], [11,12,13,14],[11,12,13,14]]).astype(float)
print(3, env_vect.observations) # --> returns new values OK
for env in env_vect.envs: print("3a", env.state) # --> returns the values set individually NOT OK!
env_vect.step([1 for i in range(nn)]) # --> returns new values that are based on the original random values NOT OK!
Please could you provide examples of what you want to do for the single agent example as running reset
will change the state
data?
For vector environments, there is a set_attr
function that allows you to modify sub-environment's data.
vec_env = gym.vector.make("CartPole-v1", num_envs=3, asynchronous=False)
vec_env.set_attr("state", np.array([0, 1, 2, 3]))
Hi pseudo-rnd-thoughts, thx for quick response
I am trying to implement asynchronous vectorized environment where multiple env are generating examples till all of them reach terminal condition at least once. When an environment finishes early it should restart and generate next experience trajectory.
But I would like to have a control over the starting position. I have a piece of code that defines the specific state values at the start of the episode. and it works well with non-vectorized environments. but once I set the vectorized environment I am loosing this capability.
Your suggestion does not work either as the syntax below can show. assigning values with set_attr() does not change those values in the environment. the training still utilizes the values generated by the .reset()
`from copy import deepcopy
env_vect = gym.vector.make("CartPole-v1", num_envs=3, asynchronous=True) #vec_env.set_attr("state", np.array([0, 1, 2, 3]))
current_state = env_vect.reset()
print("current_state", current_state) print("self.env.state", env_vect.observations)
a = np.array([[-1.5, 0.4157, -0.0285, -0.6019], [ 1.5, 0.3979, -0.0239, -0.6019], [ 1.5, 0.8187, -0.072 , -1.2578]]) env_vect.set_attr("state", a) env_vect.set_attr("observations", a) #env_vect.observations = a
print("env_vect.observations", env_vect.observations)
for i in range(50):
next_state, reward , done, info = env_vect.step([1 for i in range(nn)])
current_state = deepcopy(next_state)
print(i, env_vect.observations, next_state, done, info)
`
and if I run only this:
env_vect = gym.vector.make("CartPole-v1", num_envs=3, asynchronous=True) vec_env.set_attr("state", np.array([0, 1, 2, 3]))
I am getting an error
AssertionError: Cannot call env.step() before calling reset()
What would be the best course of action in such case?
If you have a looking version of the problem you are trying to solve in the single environment case then you should just need to modify to use set_attr
.
Otherwise I would implementation your own custom version of the environment to remove from a list the state on each reset