New port hil serl

Open Ke-Wang1017 opened this issue 1 year ago • 0 comments

What this does

Solve the nan issue of action log probability by using a numerically more stable temperature computation.
Solve the un-converged actor&critic loss by adding torch.inference_mode on actor loss computation, so that critic is not updated on both critic and actor loss
fix the bug of target critic update
speedup offline dataset uploading by ~3 x
Use pre-commit to reformat the code

It was tested on pusht_keypoints dataset

Examples: Run the command python scripts/train.py policy=sac_pusht_keypoints env=pusht +dataset=lerobot/pusht_keypoints wandb.enable=true

Jan 06 '25 10:01 Ke-Wang1017