openpi
openpi copied to clipboard
Lack of adaptability and language grounding after Pi0.5 fine-tuning
We fine-tuned the Pi0.5 diffusion model on our Lerobot dataset, which consists of videos showing a robotic arm picking up objects from a designated area and placing them into a box—a relatively simple setting.
We conducted multiple experiments and obtained acceptable results in terms of precision and overall success rate for completing the object sorting task. However, we observed two major limitations:
-
Lack of adaptability to environmental changes. For instance, if the objects are initially on the left side of the robotic arm and the container box is on the right, the model performs well. But if we swap their positions, the robot consistently fails—often attempting completely random actions (e.g., trying to pick up the box).
-
Absence of effective language conditioning. Regardless of the input prompt, the model always performs the same task, showing no response to textual variations.
We trained the model on a relatively small dataset—around 250 videos (roughly 90 minutes in total). These limitations could partly stem from the limited training data. Nonetheless, the model seems fundamentally unable to adapt, instead merely replaying learned trajectories while ignoring both visual cues and language prompts.