openpi
openpi copied to clipboard
Fine-tune PI0 base model on Droid dataset failed when computing stats
Hi, I'm trying to fine-tune PI0 base model on Droid dataset. I noticed there is only a PI0-Fast-Finetune version of TrainConfig, so I defined a PI0-Finetune version TrainConfig as follows:
#
# Fine-tuning droid configs.
#
TrainConfig(
name="pi0_droid_finetune",
model=pi0.Pi0Config(),
data=RLDSDroidDataConfig(
repo_id="droid",
# Set this to the path to your DROID RLDS dataset (the parent directory of the `droid` directory).
rlds_data_dir="/root/",
action_space=droid_rlds_dataset.DroidActionSpace.JOINT_POSITION,
),
weight_loader=weight_loaders.CheckpointWeightLoader("/root/openpi/pi0_base/params"),
num_train_steps=30_000,
num_workers=0
)
The problem is when I try to compute normalization statistics, the program stucks for like 10 seconds, then possibly gets killed. The output is as follows:
(base) root:~/openpi# uv run --group rlds scripts/compute_norm_stats.py --config-name pi0_droid_finetune --max-frames 10_000_000
2025-07-10 17:35:44.061619: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-07-10 17:35:44.061651: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-07-10 17:35:44.062929: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-10 17:35:45.001471: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Computing stats: 0%| | 0/312500 [00:00<?, ?it/s]