gym icon indicating copy to clipboard operation
gym copied to clipboard

[Question] Question title

Open sebtac opened this issue 1 year ago • 3 comments

Question

I cannot set the state for the vectorized environments:

For non-vectorized environments we can do the following and it will set the state of the environment.

env.state = np.array([0,1,2,3])

For vectorized environments this does not work. no matter we try to set each env individually (env.state) or in batch (env_vectorized.observations) once the .step() is called it uses the original random values generated by the .reset() function.

Is there a way to achieve that behavior, or is a bug or this has simply not been implemented yet.

CREATE ENVS

nn = 3

env_vect = gym.vector.SyncVectorEnv([lambda: gym.make("CartPole-v1").env for _ in range(nn)])

print(0, type(env_vect.observations), env_vect.observations) # --> Returns array of zeros OK

env_vect.reset()

print(1, type(env_vect.observations), env_vect.observations) # --> returns array of random numbers OK

SET EACH ENV INDIVIDUALLY

for env in env_vect.envs: #print("1a", env.state) env.state = np.array([1,2,3,4]) print("1b", env.state) # --> returns values of np.array([1,2,3,4]) OK

print(2, env_vect.observations) # --> returns original random values NOT OK!

for env in env_vect.envs: print("2a", env.state) # --> returns values of np.array([1,2,3,4]) OK

print(env_vect.step([1 for i in range(nn)])) # --> returns new values that are based on the original random values NOT OK!

SET ALL ENVs at the same time

env_vect.observations = np.array([[11,12,13,14], [11,12,13,14],[11,12,13,14]]).astype(float)

print(3, env_vect.observations) # --> returns new values OK

for env in env_vect.envs: print("3a", env.state) # --> returns the values set individually NOT OK!

env_vect.step([1 for i in range(nn)]) # --> returns new values that are based on the original random values NOT OK!

sebtac avatar Feb 26 '23 19:02 sebtac

Please could you provide examples of what you want to do for the single agent example as running reset will change the state data?

For vector environments, there is a set_attr function that allows you to modify sub-environment's data.

vec_env = gym.vector.make("CartPole-v1", num_envs=3, asynchronous=False)
vec_env.set_attr("state", np.array([0, 1, 2, 3]))

pseudo-rnd-thoughts avatar Feb 26 '23 23:02 pseudo-rnd-thoughts

Hi pseudo-rnd-thoughts, thx for quick response

I am trying to implement asynchronous vectorized environment where multiple env are generating examples till all of them reach terminal condition at least once. When an environment finishes early it should restart and generate next experience trajectory.

But I would like to have a control over the starting position. I have a piece of code that defines the specific state values at the start of the episode. and it works well with non-vectorized environments. but once I set the vectorized environment I am loosing this capability.

Your suggestion does not work either as the syntax below can show. assigning values with set_attr() does not change those values in the environment. the training still utilizes the values generated by the .reset()

`from copy import deepcopy

env_vect = gym.vector.make("CartPole-v1", num_envs=3, asynchronous=True) #vec_env.set_attr("state", np.array([0, 1, 2, 3]))

current_state = env_vect.reset()

print("current_state", current_state) print("self.env.state", env_vect.observations)

a = np.array([[-1.5, 0.4157, -0.0285, -0.6019], [ 1.5, 0.3979, -0.0239, -0.6019], [ 1.5, 0.8187, -0.072 , -1.2578]]) env_vect.set_attr("state", a) env_vect.set_attr("observations", a) #env_vect.observations = a

print("env_vect.observations", env_vect.observations)

for i in range(50): next_state, reward , done, info = env_vect.step([1 for i in range(nn)]) current_state = deepcopy(next_state)
print(i, env_vect.observations, next_state, done, info)

`

and if I run only this:

env_vect = gym.vector.make("CartPole-v1", num_envs=3, asynchronous=True) vec_env.set_attr("state", np.array([0, 1, 2, 3]))

I am getting an error


AssertionError: Cannot call env.step() before calling reset()


What would be the best course of action in such case?

sebtac avatar Mar 04 '23 03:03 sebtac

If you have a looking version of the problem you are trying to solve in the single environment case then you should just need to modify to use set_attr. Otherwise I would implementation your own custom version of the environment to remove from a list the state on each reset

pseudo-rnd-thoughts avatar Mar 04 '23 23:03 pseudo-rnd-thoughts