Syllabus Multi_car_racing and Domain Randomization Progress

Multi_car_racing and Domain Randomization Progress

Open RPegoud opened this issue 1 year ago • 0 comments

Here's the current state of my work on multi_car_racing and Domain Randomization:

Installation:

Running the script still requires Docker for now, I also copied the multi_car_racing repository for convenience (for some reason installing and importing as usual didn't work), this will be cleaned up later when we are ready to move on to later stages of the project
I upgraded the pettingZoo version to 1.23 and supersuit to 3.7.2 as pettingZoo < 1.23 had a typo preventing an import (BaseParallelWraper renamed to BaseParallelWrapper)
There was a circular import in curriculum_sync_wrapper.py

from syllabus.core import Curriculum, decorate_all_functions # circular import

from syllabus.core import Curriculum
from .utils import decorate_all_functions  # fixed the problem

Script:

The task wrapper for multi_car_racing seems to work as expected
The curriculum setup is executed without any error:

env = MultiCarRacingParallelWrapper(env=env, n_agents=n_agents)
curriculum = DomainRandomization(env.task_space)
curriculum, task_queue, update_queue = make_multiprocessing_curriculum(curriculum)

However, I'm still unsure of how to update the DR curriculum compared to PLR:

# TODO: adapt to DR
if global_cycles % num_steps == 0:
    update = {
        "update_type": "on_demand",
        "metrics": {
            "action_log_dist": logprobs,
            "value": values,
            "next_value": (
                agent.get_value(next_obs)
                if step == num_steps - 1
                else None
            ),
            "rew": rb_rewards[step],
            "masks": torch.Tensor(1 - np.array(list(dones.values()))),
            "tasks": [env.unwrapped.task],
        },
    }
    curriculum.update_curriculum(update)

Finally, I attempted to implement continuous PPO by using this cleanrl ppo_continuous_action architecture along with the cleanrl_pettingzoo_pistonball_plr training script (with minor adjustments).

I'm still solving bugs and progressing through the script. For now it seems that the end_step variable prevents the loop containing the backward pass to run (end_step is equal to 0, therefore b_obs = torch.flatten(rb_obs[:end_step], start_dim=0, end_dim=1) is empty and for start in range(0, len(b_obs), batch_size) doesn't iterate properly).

Could this be due to the fact that the pistonball_plr script is unfinished ? (In hindsight I should've chosen a training loop that wasn't in the experimental folder)

Feb 29 '24 17:02 RPegoud

Syllabus Syllabus copied to clipboard

Multi_car_racing and Domain Randomization Progress

Installation:

Script:

Syllabus
Syllabus copied to clipboard