EfficientZero The first selfplay worker uses the same seed for all parallel environments

I might have found an unexpected behavior in how parallel training environments are being seeded.

I am referring to this line: https://github.com/YeWR/EfficientZero/blob/c533ebf5481be624d896c19f499ed4b2f7d7440d/core/selfplay_worker.py#L112

Because the rank of the first selfplay worker is 0, parallel environments are being initialized with the same seed, which might reduce training data diversity.

We could go for a simple fix like replacing self.rank by (self.rank + 1), however this is still problematic if considering multiple workers, as there will be seed overlap between them anyway.

A good option might be to sample a seed for each parallel environment using numpy (which is seeded before launching data workers). For instance:

envs = [self.config.new_game(np.random.randint(10**9)) for i in range(env_nums)]

May 25 '22 15:05 rPortelas

Ditto, but using randint may cause irreproducible.

May 25 '22 16:05 jamesliu

Hmm right right. Thanks for the input.

Then we could use a dedicated random state created from the original seed:

rnd_state = np.random.RandomState(self.config.seed + self.rank)
envs = [self.config.new_game(rnd_state.randint(10**9)) for _ in range(env_nums)]

May 25 '22 17:05 rPortelas