mujoco icon indicating copy to clipboard operation
mujoco copied to clipboard

Using Mujoco with ML Agents

Open michelhabib opened this issue 3 years ago • 4 comments

The way Mujoco Unity plugin is currently working is simply to attach Mujoco in runtime and execute physics on Mujoco Nodes. While that is a good approach for running simulation on runtime, but that doesn't help in running ML Agents training. ML Agents training is probably the main reason why researchers and game developers would benefit from Mujoco being open sourced and available in Unity as a plugin.

In order for ML Agents training to work properly, we need the means to:- 1- Load a Scene 2- Reset a Scene ( for running episodes) 3- Handling Multiple Scenes (for Multi Agents)

Currently, I probably can do this by modifying the MjScene.cs file, but that is neither ideal nor re-usable, so I am hoping in future versions you will be able to expose such functionality.

I will keep playing with it anyway and will let you know here if I reached a solid approach or further ideas.

michelhabib avatar Jul 28 '22 14:07 michelhabib

In terms of resetting a scene, the method I've been using is storing the kinematic state of the ragdoll (or copying it from another one) in an enumerable, and restoring it in a similar way MjScene recreates a scene when needing a reset (i.e. on episode begin).

Example static functions to use:
public static unsafe (IEnumerable<double[]>, IEnumerable<double[]>) GetMjKinematics( MjBody rootBody) 
{
    MujocoLib.mjModel_* Model = MjScene.Instance.Model;
    MujocoLib.mjData_* Data = MjScene.Instance.Data;
    var joints = rootBody.GetComponentsInChildren<MjBaseJoint>().OrderBy(j=>j.MujocoName);
    var positions = new List<double[]>();
    var velocities = new List<double[]>();
    foreach (var joint in joints)
    {
        switch (Model->jnt_type[joint.MujocoId])
        {
            default:
            case (int)MujocoLib.mjtJoint.mjJNT_HINGE:
            case (int)MujocoLib.mjtJoint.mjJNT_SLIDE:
                positions.Add(new double[] { Data->qpos[joint.QposAddress] });
                velocities.Add(new double[] { Data->qvel[joint.DofAddress] });
                break;
            case (int)MujocoLib.mjtJoint.mjJNT_BALL:

                positions.Add(new double[] { Data->qpos[joint.QposAddress],
                                             Data->qpos[joint.QposAddress+1],
                                             Data->qpos[joint.QposAddress+2],
                                             Data->qpos[joint.QposAddress+3]});

                velocities.Add(new double[] { Data->qvel[joint.DofAddress],
                                              Data->qvel[joint.DofAddress+1],
                                              Data->qvel[joint.DofAddress+2]});
                break;
            case (int)MujocoLib.mjtJoint.mjJNT_FREE:
                positions.Add(new double[] {
                                                Data->qpos[joint.QposAddress],
                                                Data->qpos[joint.QposAddress+1],
                                                Data->qpos[joint.QposAddress+2],
                                                Data->qpos[joint.QposAddress+3],
                                                Data->qpos[joint.QposAddress+4],
                                                Data->qpos[joint.QposAddress+5],
                                                Data->qpos[joint.QposAddress+6]});
                velocities.Add( new double[] {
                                                Data->qvel[joint.DofAddress],
                                                Data->qvel[joint.DofAddress+1],
                                                Data->qvel[joint.DofAddress+2],
                                                Data->qvel[joint.DofAddress+3],
                                                Data->qvel[joint.DofAddress+4],
                                                Data->qvel[joint.DofAddress+5]});
                break;
            }
        }
    return (positions, velocities);
}

public static unsafe void SetMjKinematics(this MjScene mjScene, MjBody rootBody, IEnumerable<double[]> positions, IEnumerable<double[]> velocities)
{
    MujocoLib.mjModel_* Model = MjScene.Instance.Model;
    MujocoLib.mjData_* Data = MjScene.Instance.Data;
    var joints = rootBody.GetComponentsInChildren<MjBaseJoint>().OrderBy(j=>j.MujocoName);
    foreach ((var joint, (var position, var velocity)) in joints.Zip( positions.Zip(velocities, Tuple.Create), Tuple.Create))
    {
        switch (Model->jnt_type[joint.MujocoId])
        {
            default:
            case (int)MujocoLib.mjtJoint.mjJNT_HINGE:
            case (int)MujocoLib.mjtJoint.mjJNT_SLIDE:
                Data->qpos[joint.QposAddress] = position[0];
                Data->qvel[joint.DofAddress] = velocity[0];
                break;
            case (int)MujocoLib.mjtJoint.mjJNT_BALL:
                Data->qpos[joint.QposAddress] = position[0];
                Data->qpos[joint.QposAddress + 1] = position[1];
                Data->qpos[joint.QposAddress + 2] = position[2];
                Data->qpos[joint.QposAddress + 3] = position[3];
                Data->qvel[joint.DofAddress] = velocity[0];
                Data->qvel[joint.DofAddress + 1] = velocity[1];
                Data->qvel[joint.DofAddress + 2] = velocity[2];
                break;
            case (int)MujocoLib.mjtJoint.mjJNT_FREE:
                Data->qpos[joint.QposAddress] = position[0];
                Data->qpos[joint.QposAddress + 1] = position[1];
                Data->qpos[joint.QposAddress + 2] = position[2];
                Data->qpos[joint.QposAddress + 3] = position[3];
                Data->qpos[joint.QposAddress + 4] = position[4];
                Data->qpos[joint.QposAddress + 5] = position[5];
                Data->qpos[joint.QposAddress + 6] = position[6];
                Data->qvel[joint.DofAddress] = velocity[0];
                Data->qvel[joint.DofAddress + 1] = velocity[1];
                Data->qvel[joint.DofAddress + 2] = velocity[2];
                Data->qvel[joint.DofAddress + 3] = velocity[3];
                Data->qvel[joint.DofAddress + 4] = velocity[4];
                Data->qvel[joint.DofAddress + 5] = velocity[5];
                break;
        }

    }
    // update mj transforms:
    MujocoLib.mj_kinematics(Model, Data);
    mjScene.SyncUnityToMjState();
}

Balint-H avatar Jul 30 '22 12:07 Balint-H

In terms of loading a scene, the plugin works somewhat well with prefabs. You can store a scene in one and instantiate it during runtime, but with weird behaviour if you create multiple instances. I agree, better support for prefabs and prefab instantiation could be a way to handle all three issues simultaneously. To restart you could just destroy and reinstantiate (potentially wasting some resouces), and for multiple scenes you could instantiate with an offset.

Lastly multi agent learning works with ML-Agents with the num-envs argument when running from build. This wastes some resources due to running multiple instances of mujoco in the background, but works. You can also place multiple agents/robots in the scene, and I had success with training with them with a slight speedup compared to just relying on num-envs.

Balint-H avatar Jul 30 '22 12:07 Balint-H

In terms of resetting a scene, the method I've been using is storing the kinematic state of the ragdoll (or copying it from another one) in an enumerable, and restoring it in a similar way MjScene recreates a scene when needing a reset (i.e. on episode begin).

Example static functions to use:

Thanks @Balint-H for the tip! Very much appreciated :) By inspecting the MjScene.cs file, I got a few ideas similar to this one, and it's most probably ideal to save the state as you mentioned and make sure to call SyncUnityToMjState I did it forcefully with a DestroyScene/CreateScene - the easy but not optimized way. It works well so far, but I will consider your approach when building a multi-agent approach.

michelhabib avatar Jul 30 '22 21:07 michelhabib

Lastly multi agent learning works with ML-Agents with the num-envs argument when running from build. This wastes some resources due to running multiple instances of mujoco in the background, but works. You can also place multiple agents/robots in the scene, and I had success with training with them with a slight speedup compared to just relying on num-envs.

I am trying a slightly different approach, training multiple agents simultaneously in the same scene, like the classic 3dball example. This approach speeds up training alot in some scenarios, also it should be one instance of Mujoco, and multiple mujoco envs. I am using Model/Data Array instead of a single Model/Data. POC seems to work, but code needs some modification to run efficiently. I will try to share the code once it works, hopefully soon :) Cheers

michelhabib avatar Jul 30 '22 22:07 michelhabib

Lastly multi agent learning works with ML-Agents with the num-envs argument when running from build. This wastes some resources due to running multiple instances of mujoco in the background, but works. You can also place multiple agents/robots in the scene, and I had success with training with them with a slight speedup compared to just relying on num-envs.

I am trying a slightly different approach, training multiple agents simultaneously in the same scene, like the classic 3dball example. This approach speeds up training alot in some scenarios, also it should be one instance of Mujoco, and multiple mujoco envs. I am using Model/Data Array instead of a single Model/Data. POC seems to work, but code needs some modification to run efficiently. I will try to share the code once it works, hopefully soon :) Cheers

Hi @michelhabib, what you have described sounds like it could work and can be very useful - did you manage to get it work in the end?

breakds avatar May 09 '23 22:05 breakds