openpi
openpi copied to clipboard
Poor training performance (3.3% success rate) with pi0 model on locally downloaded official LIBERO dataset
I'm experiencing extremely poor performance when training the pi0 model using PyTorch with a locally downloaded version of the official LIBERO dataset. After training, the model achieves only ~3.3% success rate, which is far below expected performance:
My Personal Training Configuration: TrainConfig( name="pi0_lyh_libero_lora_official_full", model=pi0_config.Pi0Config(), # Data configuration data=LeRobotLiberoDataConfig( repo_id="/opt/liblibai-models/user-workspace2/dataset/libero_dataset", # local_files_only=True, # Using local dataset downloaded from physical_intelligence/libero base_config=DataConfig( prompt_from_task=True, # Load task descriptions from dataset's task field ), extra_delta_transform=True, # Additional transformations required for LIBERO ), # Load converted PyTorch base model weight_loader=weight_loaders.CheckpointWeightLoader( "/opt/liblibai-models/user-workspace2/users/lyh/model_checkpoint/pi0/pytorch/pi0_base_lora" ), # Training hyperparameters num_train_steps=50_000, batch_size=32, # Adjusted based on VRAM pytorch_training_precision="bfloat16", # or "float32" if encountering numerical issues
# LoRA specific settings
# freeze_filter=pi0_config.Pi0Config().get_freeze_filter(),
# Disable EMA (not needed for LoRA fine-tuning)
ema_decay=None,
# Optional: Enable gradient checkpointing to save VRAM
# gradient_checkpointing=True,
) Expected Behavior: The model should achieve success rates comparable to the official pi0 model performance on LIBERO tasks. Actual Behavior: Training results in only ~3.3% success rate, which suggests a fundamental issue with the training setup. Questions/Requests: Is there an issue with how I'm loading the locally downloaded dataset? Are there any critical configuration parameters missing or set incorrectly? Could the LoRA configuration be causing this poor performance? What is the expected success rate baseline for pi0 on LIBERO, and what training steps/configurations are recommended to achieve it? Any guidance on debugging this performance issue would be greatly appreciated.