Multi-Agent-Transformer
Multi-Agent-Transformer copied to clipboard
Config Setting on Bi-DexHands domain
Problem: Reproduced result is lower than the one in paper a lot
Details: I want to reproduce the results in Bi-DexHands domain. I use the scripts which you provide directly.
#!/bin/sh
env="hands"
task="ShadowHandCatchOver2Underarm"
#ShadowHandDoorCloseOutward
#ShadowHandDoorOpenInward
#ShadowHandCatchOver2Underarm
algo="mat"
exp="single"
seed=1
echo "env is ${env}, task is ${task}, algo is ${algo}, exp is ${exp}, seed is ${seed}"
CUDA_VISIBLE_DEVICES=0 python train/train_hands.py --env_name ${env} --seed ${seed} --algorithm_name ${algo} --experiment_name ${exp} --task ${task} --n_rollout_threads 80 --lr 5e-5 --entropy_coef 0.001 --max_grad_norm 0.5 --eval_episodes 5 --log_interval 25 --n_training_threads 16 --num_mini_batch 1 --num_env_steps 50000000 --gamma 0.96 --ppo_epoch 5 --clip_param 0.2 --use_value_active_masks --add_center_xy --use_state_agent --use_policy_active_masks
However, it shows there are some bugs :
usage: train_hands.py [-h] [--sim_device SIM_DEVICE] [--pipeline PIPELINE]
[--graphics_device_id GRAPHICS_DEVICE_ID]
[--flex | --physx] [--num_threads NUM_THREADS]
[--subscenes SUBSCENES] [--slices SLICES]
[--env_name ENV_NAME] [--algorithm_name ALGORITHM_NAME]
[--experiment_name EXPERIMENT_NAME] [--n_block N_BLOCK]
[--n_embd N_EMBD] [--lr LR]
[--value_loss_coef VALUE_LOSS_COEF]
[--entropy_coef ENTROPY_COEF]
[--max_grad_norm MAX_GRAD_NORM]
[--eval_episodes EVAL_EPISODES]
[--n_training_threads N_TRAINING_THREADS]
[--n_rollout_threads N_ROLLOUT_THREADS]
[--num_mini_batch NUM_MINI_BATCH]
[--num_env_steps NUM_ENV_STEPS] [--ppo_epoch PPO_EPOCH]
[--log_interval LOG_INTERVAL] [--clip_param CLIP_PARAM]
[--use_value_active_masks] [--use_eval]
[--add_center_xy] [--use_state_agent]
[--use_policy_active_masks] [--dec_actor]
[--share_actor] [--test] [--play] [--resume RESUME]
[--checkpoint CHECKPOINT] [--headless] [--horovod]
[--task TASK] [--task_type TASK_TYPE]
[--rl_device RL_DEVICE] [--logdir LOGDIR]
[--experiment EXPERIMENT] [--metadata]
[--cfg_train CFG_TRAIN] [--cfg_env CFG_ENV]
[--num_envs NUM_ENVS] [--episode_length EPISODE_LENGTH]
[--seed SEED] [--max_iterations MAX_ITERATIONS]
[--steps_num STEPS_NUM]
[--minibatch_size MINIBATCH_SIZE] [--randomize]
[--torch_deterministic] [--algo ALGO]
[--model_dir MODEL_DIR]
train_hands.py: error: unrecognized arguments: --gamma 0.96
So, I delete the --gamma parameter and directly modify the file config.py
. Set the default into 0.96:
parser.add_argument("--gamma", type=float, default=0.96,
help='discount factor for rewards (default: 0.99)')
However, I get the result:
Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8250/8333 episodes, total num timesteps 49506000/50000000, FPS 1021.
average_step_rewards is 0.330600768327713.
some episodes done, average rewards: 19.574572331772863
Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8275/8333 episodes, total num timesteps 49656000/50000000, FPS 1022.
average_step_rewards is 0.3444286584854126.
some episodes done, average rewards: 20.018084016291084
Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8300/8333 episodes, total num timesteps 49806000/50000000, FPS 1023. average_step_rewards is 0.3596132695674896.
some episodes done, average rewards: 20.760233263901018
Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8325/8333 episodes, total num timesteps 49956000/50000000, FPS 1024.
average_step_rewards is 0.3465554118156433.
some episodes done, average rewards: 20.917307748507582
It is far away from your results (about 25) in the paper. I guess there might be some config set wrongly. Can I get a latest script or any instructions about what I might do wrong?
hiya,thank you so much for your attention, I noticed that your error message contains a lot of hyper parameters that are not in this repo, e.g. num_envs, cfg_train, steps_num... It seems that your config conflicts with other things in your local Python environment/workspace.
Thus, I recommend first to find out the cause of this strange error before modifying the config file directly~~ hoping it might help you~~
Those hyper parameters are not introduced into this code by me as I just use the original code. It seems that the Bi-Dexhands benchmark introduces these config. I haven't modified this source code. I just clone the current version into local and run the script: ./mat/scripts/train_hands.sh
.
I have checked this just now. The original code can reproduce the bug. And it still gets the same error information.
Those hyperparam may be these: https://github.com/PKU-MARL/DexterousHands/blob/99c1e2a399fb084df5c02dbb5f6182d394fcd2e8/bidexhands/utils/config.py#L244 Thanks for your help in advance!