Virtual Camera update: Rllib and onnx support, added onnx
[!IMPORTANT] This uses a mixture of plugin updates not currently in main branch including: https://github.com/edbeeching/godot_rl_agents_plugin/pull/37, https://github.com/edbeeching/godot_rl_agents_plugin/pull/40, and following modifications:
- Removed the negative step reward (it seems to be learning much more quickly without it, might be related to the relatively large magnitude of the reward)
- Changed formatting to channel last for Rllib (I think sb3 can work with both, but would need larger image size):
- Changed image size to 10x10 as it's one of supported CNN formats for Rllib and is more lightweight
- Manually changed which obs to read from the dict during inference in Sync node code, set as:
var action = model.run_inference(
obs[agent_id]["camera_2d"], 1.0
)
- Added a
training modeinspector property toRGBCameraSensor3D, if true, it sends hex encoded image data, otherwise it sends image data without hex encoding (needed for inference). Currently this needs to be set toFalseinPlayer.tscnwhen running onnx inference, and true when training. A better solution is needed in the future for these manual changes, but it's a quick way to get things working for this example. - Added the trained onnx with training for just 68 seconds with a single env on my PC then manually stopped, using the following Rllib example config (from the multiagent Godot RL branch):
algorithm: PPO
# Multi-agent-env setting:
# If true:
# - Any AIController with done = true will receive zeroes as action values until all AIControllers are done, an episode ends at that point.
# - ai_controller.needs_reset will also be set to true every time a new episode begins (but you can ignore it in your env if needed).
# If false:
# - AIControllers auto-reset in Godot and will receive actions after setting done = true.
# - Each AIController has its own episodes that can end/reset at any point.
# Set to false if you have a single policy name for all agents set in AIControllers
env_is_multiagent: false
checkpoint_frequency: 30
# You can set one or more stopping criteria
stop:
#episode_reward_mean: 0
#training_iteration: 1000
#timesteps_total: 10000
time_total_s: 10000000
config:
env: godot
env_config:
env_path: "virtualcamera.console.exe" # Set your env path here (exported executable from Godot) - e.g. 'env_path.exe' on Windows
action_repeat: null # Doesn't need to be set here, you can set this in sync node in Godot editor as well
show_window: true # Displays game window while training. Might be faster when false in some cases, turning off also reduces GPU usage if you don't need rendering.
speedup: 30 # Speeds up Godot physics
framework: torch # ONNX models exported with torch are compatible with the current Godot RL Agents Plugin
lr: 0.0003
#lambda: 0.95
#gamma: 0.99
#vf_loss_coeff: 0.5
vf_clip_param: .inf
#clip_param: 0.2
entropy_coeff: 0.0001
entropy_coeff_schedule: null
#grad_clip: 0.5
normalize_actions: False
clip_actions: True # During onnx inference we simply clip the actions to [-1.0, 1.0] range, set here to match
rollout_fragment_length: 32
sgd_minibatch_size: 64
num_workers: 1
num_envs_per_worker: 1 # This will be set automatically if not multi-agent. If multi-agent, changing this changes how many envs to launch per worker.
# The value below needs changing per env
train_batch_size: 512 # Basic calculation for this value can be rollout_fragment_length * num_workers * num_envs_per_worker (how many AIControllers you have if not multi_agent, otherwise the value you set)
num_sgd_iter: 4
batch_mode: truncate_episodes
num_gpus: 0
model:
vf_share_layers: False
fcnet_hiddens: [64, 64]
Onnx inference test video:
https://github.com/edbeeching/godot_rl_agents_examples/assets/61947090/16750152-048b-4de5-b5b3-2933afc58258
On a related note, sb3 export guide has been updated to include preprocessing, so perhaps it is easier now to export the model with it as well, will check at some point: https://stable-baselines3.readthedocs.io/en/master/guide/export.html#export-to-onnx. But we might also need to adjust so it outputs only the action values rather than actions, values, and logprob (since we don't use the other two, and the current code in the multiagent branch only supports two different formats, our current export from sb3 and rllib).