openpi
openpi copied to clipboard
Question about using quantiles in pi 0.5 when fine tune.
Hello,
First, thank you for your excellent work!
I am currently working on fine-tuning the pi0.5 model using my own custom dataset on ARX arm. During this process, I've encountered a potential issue related to the use_quantile_norm parameter that significantly impacts performance.
📝 Description of the Issue
I have observed two distinct outcomes:
- When I manually modify the code to disable
use_quantile_norm(set it toFalse), my fine-tuning process works very well, and the model achieves good performance. - However, when
use_quantile_normis enabled (which appears to be the default for pi0.5), the model's performance during fine-tuning is much worse. The performance is shown in the video below.
https://github.com/user-attachments/assets/825fdfec-f5d3-44e6-abeb-8d6805903de9
| use_quantile_norm = False |
https://github.com/user-attachments/assets/398202c8-49a0-4c82-87ef-229c92abc720
| use_quantile_norm = True |
Evidence in Code
I investigated the codebase and found that this behavior seems to be hard-coded. In the file openpi/src/openpi/training/config.py, inside the DataConfigFactory.create_base_config method, the parameter is set as follows:
# openpi/src/openpi/training/config.py (around line 186)
use_quantile_norm=model_config.model_type != ModelType.PI0,
)
This line of code automatically sets use_quantile_norm to True for any model that is not ModelType.PI0 (which includes pi0.5). This prevents users from disabling it during fine-tuning without altering the core code.
💡 My Hypothesis
My suspicion is that this hard-coded value, while potentially correct for pre-training, may not be suitable for fine-tuning scenarios.
My hypothesis is that for fine-tuning, use_quantile_norm should ideally be disabled (False).
The reason is that fine-tuning datasets are often much smaller than the large-scale pre-training datasets. Applying quantile normalization to a small data distribution might be too aggressive, potentially clipping or distorting a significant amount of useful data (e.g., valid actions/observations). This could explain the severe performance degradation I am observing.
🤔 Question and Proposed Solution
My questions for the maintainers are:
- Is this hard-coded behavior intended, even for fine-tuning?
- Would you be open to making this parameter configurable, or defaulting it to
Falsespecifically for fine-tuning tasks?
If the team agrees that this is a valid concern, I would be happy to submit a Pull Request to address it.
Thank you for your time and consideration!