Virtual Camera update: Rllib and onnx support, added onnx

Open Ivan-267 opened this issue 1 year ago • 1 comments

[!IMPORTANT] This uses a mixture of plugin updates not currently in main branch including: https://github.com/edbeeching/godot_rl_agents_plugin/pull/37, https://github.com/edbeeching/godot_rl_agents_plugin/pull/40, and following modifications:

Removed the negative step reward (it seems to be learning much more quickly without it, might be related to the relatively large magnitude of the reward)
Changed formatting to channel last for Rllib (I think sb3 can work with both, but would need larger image size):
Changed image size to 10x10 as it's one of supported CNN formats for Rllib and is more lightweight
Manually changed which obs to read from the dict during inference in Sync node code, set as:

			var action = model.run_inference(
				obs[agent_id]["camera_2d"], 1.0
			)

Added a training mode inspector property to RGBCameraSensor3D, if true, it sends hex encoded image data, otherwise it sends image data without hex encoding (needed for inference). Currently this needs to be set to False in Player.tscn when running onnx inference, and true when training. A better solution is needed in the future for these manual changes, but it's a quick way to get things working for this example.
Added the trained onnx with training for just 68 seconds with a single env on my PC then manually stopped, using the following Rllib example config (from the multiagent Godot RL branch):

algorithm: PPO

# Multi-agent-env setting:
# If true:
# - Any AIController with done = true will receive zeroes as action values until all AIControllers are done, an episode ends at that point.
# - ai_controller.needs_reset will also be set to true every time a new episode begins (but you can ignore it in your env if needed).
# If false:
# - AIControllers auto-reset in Godot and will receive actions after setting done = true.
# - Each AIController has its own episodes that can end/reset at any point.
# Set to false if you have a single policy name for all agents set in AIControllers
env_is_multiagent: false

checkpoint_frequency: 30

# You can set one or more stopping criteria
stop:
    #episode_reward_mean: 0
    #training_iteration: 1000
    #timesteps_total: 10000
    time_total_s: 10000000

config:
    env: godot
    env_config:
        env_path: "virtualcamera.console.exe" # Set your env path here (exported executable from Godot) - e.g. 'env_path.exe' on Windows
        action_repeat: null # Doesn't need to be set here, you can set this in sync node in Godot editor as well
        show_window: true # Displays game window while training. Might be faster when false in some cases, turning off also reduces GPU usage if you don't need rendering.
        speedup: 30 # Speeds up Godot physics

    framework: torch # ONNX models exported with torch are compatible with the current Godot RL Agents Plugin

    lr: 0.0003
   #lambda: 0.95
    #gamma: 0.99

    #vf_loss_coeff: 0.5
    vf_clip_param: .inf
    #clip_param: 0.2
    entropy_coeff: 0.0001
    entropy_coeff_schedule: null
    #grad_clip: 0.5

    normalize_actions: False
    clip_actions: True # During onnx inference we simply clip the actions to [-1.0, 1.0] range, set here to match

    rollout_fragment_length: 32
    sgd_minibatch_size: 64
    num_workers: 1
    num_envs_per_worker: 1 # This will be set automatically if not multi-agent. If multi-agent, changing this changes how many envs to launch per worker.
    # The value below needs changing per env
    train_batch_size: 512 # Basic calculation for this value can be rollout_fragment_length * num_workers * num_envs_per_worker (how many AIControllers you have if not multi_agent, otherwise the value you set)

    num_sgd_iter: 4
    batch_mode: truncate_episodes

    num_gpus: 0
    model:
        vf_share_layers: False
        fcnet_hiddens: [64, 64]

Onnx inference test video:

https://github.com/edbeeching/godot_rl_agents_examples/assets/61947090/16750152-048b-4de5-b5b3-2933afc58258

Apr 07 '24 17:04 Ivan-267

On a related note, sb3 export guide has been updated to include preprocessing, so perhaps it is easier now to export the model with it as well, will check at some point: https://stable-baselines3.readthedocs.io/en/master/guide/export.html#export-to-onnx. But we might also need to adjust so it outputs only the action values rather than actions, values, and logprob (since we don't use the other two, and the current code in the multiagent branch only supports two different formats, our current export from sb3 and rllib).

Apr 11 '24 20:04 Ivan-267