Multi-Agent-Transformer icon indicating copy to clipboard operation
Multi-Agent-Transformer copied to clipboard

Config Setting on Bi-DexHands domain

Open JensenLZX opened this issue 1 year ago • 2 comments

Problem: Reproduced result is lower than the one in paper a lot

Details: I want to reproduce the results in Bi-DexHands domain. I use the scripts which you provide directly.

#!/bin/sh
env="hands"
task="ShadowHandCatchOver2Underarm"
#ShadowHandDoorCloseOutward
#ShadowHandDoorOpenInward
#ShadowHandCatchOver2Underarm
algo="mat"
exp="single"
seed=1

echo "env is ${env}, task is ${task}, algo is ${algo}, exp is ${exp}, seed is ${seed}"
CUDA_VISIBLE_DEVICES=0 python train/train_hands.py --env_name ${env} --seed ${seed} --algorithm_name ${algo} --experiment_name ${exp} --task ${task} --n_rollout_threads 80 --lr 5e-5 --entropy_coef 0.001 --max_grad_norm 0.5 --eval_episodes 5 --log_interval 25 --n_training_threads 16 --num_mini_batch 1 --num_env_steps 50000000 --gamma 0.96 --ppo_epoch 5 --clip_param 0.2 --use_value_active_masks --add_center_xy --use_state_agent --use_policy_active_masks

However, it shows there are some bugs :

usage: train_hands.py [-h] [--sim_device SIM_DEVICE] [--pipeline PIPELINE]
                      [--graphics_device_id GRAPHICS_DEVICE_ID]
                      [--flex | --physx] [--num_threads NUM_THREADS]
                      [--subscenes SUBSCENES] [--slices SLICES]
                      [--env_name ENV_NAME] [--algorithm_name ALGORITHM_NAME]
                      [--experiment_name EXPERIMENT_NAME] [--n_block N_BLOCK]
                      [--n_embd N_EMBD] [--lr LR]
                      [--value_loss_coef VALUE_LOSS_COEF]
                      [--entropy_coef ENTROPY_COEF]
                      [--max_grad_norm MAX_GRAD_NORM]
                      [--eval_episodes EVAL_EPISODES]
                      [--n_training_threads N_TRAINING_THREADS]
                      [--n_rollout_threads N_ROLLOUT_THREADS]
                      [--num_mini_batch NUM_MINI_BATCH]
                      [--num_env_steps NUM_ENV_STEPS] [--ppo_epoch PPO_EPOCH]
                      [--log_interval LOG_INTERVAL] [--clip_param CLIP_PARAM]
                      [--use_value_active_masks] [--use_eval]
                      [--add_center_xy] [--use_state_agent]
                      [--use_policy_active_masks] [--dec_actor]
                      [--share_actor] [--test] [--play] [--resume RESUME]
                      [--checkpoint CHECKPOINT] [--headless] [--horovod]
                      [--task TASK] [--task_type TASK_TYPE]
                      [--rl_device RL_DEVICE] [--logdir LOGDIR]
                      [--experiment EXPERIMENT] [--metadata]
                      [--cfg_train CFG_TRAIN] [--cfg_env CFG_ENV]
                      [--num_envs NUM_ENVS] [--episode_length EPISODE_LENGTH]
                      [--seed SEED] [--max_iterations MAX_ITERATIONS]
                      [--steps_num STEPS_NUM]
                      [--minibatch_size MINIBATCH_SIZE] [--randomize]
                      [--torch_deterministic] [--algo ALGO]
                      [--model_dir MODEL_DIR]
train_hands.py: error: unrecognized arguments: --gamma 0.96

So, I delete the --gamma parameter and directly modify the file config.py. Set the default into 0.96:

    parser.add_argument("--gamma", type=float, default=0.96,
                        help='discount factor for rewards (default: 0.99)')

However, I get the result:

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8250/8333 episodes, total num timesteps 49506000/50000000, FPS 1021.

average_step_rewards is 0.330600768327713.
some episodes done, average rewards:  19.574572331772863

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8275/8333 episodes, total num timesteps 49656000/50000000, FPS 1022.
                                                                                                                                                average_step_rewards is 0.3444286584854126.
some episodes done, average rewards:  20.018084016291084

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8300/8333 episodes, total num timesteps 49806000/50000000, FPS 1023.                                                                                                                                                         average_step_rewards is 0.3596132695674896.
some episodes done, average rewards:  20.760233263901018

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8325/8333 episodes, total num timesteps 49956000/50000000, FPS 1024.         
average_step_rewards is 0.3465554118156433.
some episodes done, average rewards:  20.917307748507582

It is far away from your results (about 25) in the paper. I guess there might be some config set wrongly. Can I get a latest script or any instructions about what I might do wrong?

JensenLZX avatar Feb 05 '24 08:02 JensenLZX

hiya,thank you so much for your attention, I noticed that your error message contains a lot of hyper parameters that are not in this repo, e.g. num_envs, cfg_train, steps_num... It seems that your config conflicts with other things in your local Python environment/workspace.

Thus, I recommend first to find out the cause of this strange error before modifying the config file directly~~ hoping it might help you~~

morning9393 avatar Feb 05 '24 09:02 morning9393

Those hyper parameters are not introduced into this code by me as I just use the original code. It seems that the Bi-Dexhands benchmark introduces these config. I haven't modified this source code. I just clone the current version into local and run the script: ./mat/scripts/train_hands.sh.

I have checked this just now. The original code can reproduce the bug. And it still gets the same error information.

Those hyperparam may be these: https://github.com/PKU-MARL/DexterousHands/blob/99c1e2a399fb084df5c02dbb5f6182d394fcd2e8/bidexhands/utils/config.py#L244 Thanks for your help in advance!

JensenLZX avatar Feb 05 '24 10:02 JensenLZX