envpool icon indicating copy to clipboard operation
envpool copied to clipboard

[BUG]I initialized 8 environments and expected the reset function to return 8 different states, but I found they are exactly the same.

Open bisonliao opened this issue 7 months ago • 4 comments

Describe the bug

I initialized 8 environments and expected the reset function to return 8 different states, but I found they are exactly the same. I applied different actions to these 8 environments, expecting the returned next observations to be different states, but they were still exactly the same.

I also tried the code from the CleanRL project and encountered the same issue. code:

envs = envpool.make("CartPole-v1", num_envs=8, seed=43,  env_type="gymnasium", batch_size=8)
obs = envs.reset()
print(obs)
actions = torch.zeros((8,), dtype=torch.int32)
actions[3] = 1
actions[5] = 1
next_obs, _, _, _, _ = envs.step(actions.numpy())
print(next_obs)

output: (array([[-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387], [-0.00031387, -0.00031387, -0.00031387, -0.00031387]], dtype=float32), {'env_id': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32), 'players': {'env_id': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)}, 'elapsed_step': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)}) [[-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804] [-0.00110804 -0.00110804 -0.00110804 -0.00110804]]

To Reproduce

import envpool
import torch
envs = envpool.make("CartPole-v1", num_envs=8, seed=43,  env_type="gymnasium", batch_size=8)
obs = envs.reset()
print(obs)
actions = torch.zeros((8,), dtype=torch.int32)
actions[3] = 1
actions[5] = 1
next_obs, _, _, _, _ = envs.step(actions.numpy())
print(next_obs)

bisonliao avatar May 19 '25 01:05 bisonliao

I also saw that "obs" is filled with exactly the same value for all dimensions of all environments. If can you find a solution or explanation please share.

omertarikbanus avatar Jul 17 '25 14:07 omertarikbanus

Try pip install "numpy<2"

wty-yy avatar Aug 13 '25 15:08 wty-yy

I can reproduce the bug with make_gymnasium(task_id="HealthGathering-v1", ..., seed=123).
For the record I use the latest env pool version (0.8.4) and numpy 1.26.4.

AlexandreBrown avatar Oct 17 '25 00:10 AlexandreBrown

I found a solution that works for my case. I also created a PR but I don't know if maintainers will accept it.

here is how I did it: go to envpool/envpool/core/py_envpool.h

replace the return statement in line 46 return py::array(a.Shape(), reinterpret_cast<dtype*>(a.Data()), capsule);

with this.

std::vector<py::ssize_t> strides(a.Shape().size()); if (!strides.empty()) { strides[strides.size() - 1] = sizeof(dtype); for (int i = strides.size() - 2; i >= 0; --i) { strides[i] = strides[i + 1] * a.Shape()[i + 1]; } } return py::array(py::dtype::of<dtype>(), a.Shape(), strides, reinterpret_cast<dtype*>(a.Data()), capsule);

this solves the problem for me. i am running the code with no issues on my intel ubuntu pc(s) and mac m4 with ubuntu22.04 docker image. the numpy version is 2.2.5

omertarikbanus avatar Oct 17 '25 16:10 omertarikbanus