Syllabus
Syllabus copied to clipboard
Multi_car_racing and Domain Randomization Progress
Here's the current state of my work on multi_car_racing
and Domain Randomization:
Installation:
-
Running the script still requires Docker for now, I also copied the
multi_car_racing
repository for convenience (for some reason installing and importing as usual didn't work), this will be cleaned up later when we are ready to move on to later stages of the project -
I upgraded the pettingZoo version to 1.23 and supersuit to 3.7.2 as pettingZoo < 1.23 had a typo preventing an import (
BaseParallelWraper
renamed toBaseParallelWrapper
) -
There was a circular import in
curriculum_sync_wrapper.py
from syllabus.core import Curriculum, decorate_all_functions # circular import
from syllabus.core import Curriculum
from .utils import decorate_all_functions # fixed the problem
Script:
-
The task wrapper for
multi_car_racing
seems to work as expected -
The curriculum setup is executed without any error:
env = MultiCarRacingParallelWrapper(env=env, n_agents=n_agents)
curriculum = DomainRandomization(env.task_space)
curriculum, task_queue, update_queue = make_multiprocessing_curriculum(curriculum)
- However, I'm still unsure of how to update the DR curriculum compared to PLR:
# TODO: adapt to DR
if global_cycles % num_steps == 0:
update = {
"update_type": "on_demand",
"metrics": {
"action_log_dist": logprobs,
"value": values,
"next_value": (
agent.get_value(next_obs)
if step == num_steps - 1
else None
),
"rew": rb_rewards[step],
"masks": torch.Tensor(1 - np.array(list(dones.values()))),
"tasks": [env.unwrapped.task],
},
}
curriculum.update_curriculum(update)
- Finally, I attempted to implement continuous PPO by using this cleanrl
ppo_continuous_action
architecture along with thecleanrl_pettingzoo_pistonball_plr
training script (with minor adjustments).
I'm still solving bugs and progressing through the script. For now it seems that the end_step
variable prevents the loop containing the backward pass to run (end_step
is equal to 0, therefore b_obs = torch.flatten(rb_obs[:end_step], start_dim=0, end_dim=1)
is empty and for start in range(0, len(b_obs), batch_size)
doesn't iterate properly).
Could this be due to the fact that the pistonball_plr
script is unfinished ? (In hindsight I should've chosen a training loop that wasn't in the experimental
folder)