openpi icon indicating copy to clipboard operation
openpi copied to clipboard

Evaluation on SimplerEnv After Finetuning Pi05 Does Not Meet Expected Performance

Open zwbx opened this issue 1 week ago • 0 comments

Hi, thanks for your great efforts on this work. I was wandering if there are any attempt to test pi05 on SIMPLERENV benchmark. I tried by myself and find it take a lot to make it work well

What I have tried and detailed setting task: SIMPLERENV, Widow X dataset: https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot evaluation: https://github.com/DelinQu/SimplerEnv-OpenVLA steps: 80k batchsize: 1024 on 32 H100 (32 per GPU) lr: 5e-5 norm: zscore (i tried the default one, quantile norm, which performs even worse)

results (ignore the invalid numbers, which is from evaluation codebase):

0 1 2 3 4 5 6 7 8 9
put_spoon_on_tablecloth/matching_partial 0.7083333333333334 nan nan 0.167 nan 0.347 0.778 nan 0.041 0.375
put_spoon_on_tablecloth/matching_entire 0.5416666666666666 nan nan 0.0 nan 0.125 0.472 nan 0.0 0.208
put_carrot_on_plate/matching_partial 0.9166666666666666 nan nan 0.208 nan 0.528 0.278 nan 0.333 0.333
put_carrot_on_plate/matching_entire 0.6666666666666666 nan nan 0.042 nan 0.083 0.097 nan 0.0 0.25
stack_green_block_on_yellow_block/matching_partial 0.9166666666666666 nan nan 0.083 nan 0.319 0.403 nan 0.125 0.083
stack_green_block_on_yellow_block/matching_entire 0.5 nan nan 0.0 nan 0.0 0.042 nan 0.0 0.083
put_eggplant_in_basket/matching_partial 0.20833333333333334 nan nan 0.0 nan 0.667 0.875 nan 0.083 0.0
put_eggplant_in_basket/matching_entire 0.20833333333333334 nan nan 0.0 nan 0.431 0.569 nan 0.041 0.0
ckpt_name Pi05-ft RT-1(Converged) RT-1(15%) RT-1-X RT-2-X Octo-Base Octo-Small RT-1(begin) OpenVLA RoboVLM

Overall, the performance is somewhat worse than a result from a open sourced version pi0 (https://github.com/allenzren/open-pi-zero)

zwbx avatar Nov 23 '25 14:11 zwbx