goat-bench
goat-bench copied to clipboard
sense_act_nn_monolithic eval result
Thank you for the great work on GOAT-Bench!
I am trying to reproduce the evaluation results using the checkpoint provided in the repository. I followed the README instructions and used the following command:
export split="val_seen"
export eval_ckpt_path_dir="/root/workspace/lab/goat-bench/data/goat-assets/checkpoints/sense_act_nn_monolithic"
python -um goat_bench.run \
--run-type eval \
--exp-config config/experiments/ver_goat_monolithic.yaml \
habitat_baselines.num_environments=1 \
habitat_baselines.trainer_name="goat_ppo" \
habitat_baselines.tensorboard_dir=$tensorboard_dir \
habitat_baselines.eval_ckpt_path_dir=$eval_ckpt_path_dir \
habitat.dataset.data_path="${DATA_PATH}/${split}/${split}.json.gz" \
habitat_baselines.load_resume_state_config=False \
habitat_baselines.eval.use_ckpt_config=False \
habitat_baselines.eval.split=$split \
habitat.task.lab_sensors.goat_goal_sensor.image_cache=/root/workspace/lab/goat-bench/data/goat-assets/goal_cache/iin/${split}_embeddings/ \
habitat.task.lab_sensors.goat_goal_sensor.language_cache=/root/workspace/lab/goat-bench/data/goat-assets/goal_cache/language_nav/${split}_instruction_clip_embeddings.pkl
The run completed successfully, but the evaluation results show task_success = 0.0000, m not sure if this is expected. Here are the final average metrics from the log:
Average episode reward: 9.7726
Average episode distance_to_goal.distance_to_target: 4.8595
Average episode success.task_success: 0.0000
Average episode success.composite_success: 0.0028
Average episode success.partial_success: 0.1638
Average episode success.object_success: 0.2507
Average episode success.image_success: 0.0809
Average episode success.description_success: 0.1248
Average episode spl.composite_spl: 0.1065
Average episode soft_spl.composite_softspl: 0.0760