lerobot
lerobot copied to clipboard
New port hil serl
What this does
- Solve the nan issue of action log probability by using a numerically more stable temperature computation.
- Solve the un-converged actor&critic loss by adding
torch.inference_modeon actor loss computation, so that critic is not updated on both critic and actor loss - fix the bug of target critic update
- speedup offline dataset uploading by ~3 x
- Use pre-commit to reformat the code
How it was tested
It was tested on pusht_keypoints dataset
Examples:
Run the command python scripts/train.py policy=sac_pusht_keypoints env=pusht +dataset=lerobot/pusht_keypoints wandb.enable=true