MicroRTS-Py Goal--Achieving generalized training and evaluation for agents

Goal--Achieving generalized training and evaluation for agents

Open xluox opened this issue 3 years ago • 1 comments

There are several steps for this goal:

[x] #72
[x] #68
[x] #75
[ ] Bridge to connect PCG with training and evaluation

Feb 24 '22 02:02 xluox

Support multiple maps during training,

As we discussed, the naïve way of supporting this feature is to reinitialize the vectorized environment, along with the storage variables such as obs, actions, every once in a while.

However, the most desirable way I think is to swap the map after a sub-environment in the vectorized environment is done.

For instance, if 5th index in done in the following code is True,

https://github.com/vwxyzjn/gym-microrts/blob/3d7a42f46efbd39a0b806388b8a445fbee48d00f/gym_microrts/envs/vec_env.py#L339

Then, we know the 5th environment is done. We can then do something slightly hacky like calling

responses = self.vec_client.gameStep(self.actions, [0] * self.num_envs)
reward, done = np.array(responses.reward), np.array(responses.done)
obs = [self._encode_obs(np.array(ro)) for ro in responses.observation]
infos = [{"raw_rewards": item} for item in reward]

# reset the 5th environment with a new map
if done[:, 0][4] == Ture:
self.vec_client.clients[4].mapPath = desired_new_map_path
response = self.vec_client.clients[4].reset()
obs[4] = self._encode_obs(np.array(response.observation)) 
return np.array(obs), reward @ self.reward_weight, done[:, 0], infos

Feb 25 '22 14:02 vwxyzjn

MicroRTS-Py MicroRTS-Py copied to clipboard

Goal--Achieving generalized training and evaluation for agents

MicroRTS-Py
MicroRTS-Py copied to clipboard