lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

New port hil serl

Open Ke-Wang1017 opened this issue 1 year ago • 0 comments

What this does

  1. Solve the nan issue of action log probability by using a numerically more stable temperature computation.
  2. Solve the un-converged actor&critic loss by adding torch.inference_mode on actor loss computation, so that critic is not updated on both critic and actor loss
  3. fix the bug of target critic update
  4. speedup offline dataset uploading by ~3 x
  5. Use pre-commit to reformat the code

How it was tested

It was tested on pusht_keypoints dataset

Examples: Run the command python scripts/train.py policy=sac_pusht_keypoints env=pusht +dataset=lerobot/pusht_keypoints wandb.enable=true

Ke-Wang1017 avatar Jan 06 '25 10:01 Ke-Wang1017