regretful-agent
regretful-agent copied to clipboard
Performance Gap (significantly lower than reported)
Hi,
I have tried to reproduce the reported result. However, my results are lower than the paper claimed. The results are shown below:
Evaluating on val_seen env ...
Epoch: [275][1/16] Time 1.875 (1.875) Loss inf (inf)
Epoch: [275][2/16] Time 1.779 (1.827) Loss inf (inf)
Epoch: [275][3/16] Time 1.974 (1.876) Loss inf (inf)
Epoch: [275][4/16] Time 1.946 (1.894) Loss inf (inf)
Epoch: [275][5/16] Time 1.816 (1.878) Loss inf (inf)
Epoch: [275][6/16] Time 1.790 (1.863) Loss inf (inf)
Epoch: [275][7/16] Time 1.829 (1.858) Loss inf (inf)
Epoch: [275][8/16] Time 1.910 (1.865) Loss inf (inf)
Epoch: [275][9/16] Time 1.683 (1.845) Loss inf (inf)
Epoch: [275][10/16] Time 1.947 (1.855) Loss inf (inf)
Epoch: [275][11/16] Time 1.788 (1.849) Loss inf (inf)
Epoch: [275][12/16] Time 1.887 (1.852) Loss inf (inf)
Epoch: [275][13/16] Time 1.575 (1.831) Loss inf (inf)
Epoch: [275][14/16] Time 1.704 (1.822) Loss inf (inf)
Epoch: [275][15/16] Time 1.492 (1.800) Loss inf (inf)
Epoch: [275][16/16] Time 1.724 (1.795) Loss inf (inf)
============================
success rate: 0.6317335945151812
rollback rate: 0.20372184133202742
rollback success rate: 0.07639569049951028
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 3.6218101292139466 | oracle_error: 2.1551905617521285 | steps: 7.056862745098039 | lengths: 12.267354546942984 | spl: 0.5606026181302899 | success_rate: 0.6284313725490196 | oracle_rate: 0.7441176470588236
Evaluating on val_unseen env ...
Epoch: [275][1/37] Time 1.413 (1.413) Loss inf (inf)
Epoch: [275][2/37] Time 1.424 (1.419) Loss inf (inf)
Epoch: [275][3/37] Time 1.279 (1.372) Loss inf (inf)
Epoch: [275][4/37] Time 1.418 (1.384) Loss inf (inf)
Epoch: [275][5/37] Time 1.312 (1.369) Loss inf (inf)
Epoch: [275][6/37] Time 1.146 (1.332) Loss inf (inf)
Epoch: [275][7/37] Time 1.152 (1.306) Loss inf (inf)
Epoch: [275][8/37] Time 1.043 (1.273) Loss inf (inf)
Epoch: [275][9/37] Time 1.016 (1.245) Loss inf (inf)
Epoch: [275][10/37] Time 1.085 (1.229) Loss inf (inf)
Epoch: [275][11/37] Time 1.080 (1.215) Loss inf (inf)
Epoch: [275][12/37] Time 0.976 (1.195) Loss inf (inf)
Epoch: [275][13/37] Time 0.997 (1.180) Loss inf (inf)
Epoch: [275][14/37] Time 1.005 (1.167) Loss inf (inf)
Epoch: [275][15/37] Time 1.037 (1.159) Loss inf (inf)
Epoch: [275][16/37] Time 0.944 (1.145) Loss inf (inf)
Epoch: [275][17/37] Time 0.899 (1.131) Loss inf (inf)
Epoch: [275][18/37] Time 0.955 (1.121) Loss inf (inf)
Epoch: [275][19/37] Time 0.891 (1.109) Loss inf (inf)
Epoch: [275][20/37] Time 0.906 (1.099) Loss inf (inf)
Epoch: [275][21/37] Time 0.889 (1.089) Loss inf (inf)
Epoch: [275][22/37] Time 0.888 (1.080) Loss inf (inf)
Epoch: [275][23/37] Time 0.855 (1.070) Loss inf (inf)
Epoch: [275][24/37] Time 0.890 (1.062) Loss inf (inf)
Epoch: [275][25/37] Time 0.840 (1.054) Loss inf (inf)
Epoch: [275][26/37] Time 0.883 (1.047) Loss inf (inf)
Epoch: [275][27/37] Time 0.857 (1.040) Loss inf (inf)
Epoch: [275][28/37] Time 0.831 (1.033) Loss inf (inf)
Epoch: [275][29/37] Time 0.843 (1.026) Loss inf (inf)
Epoch: [275][30/37] Time 0.820 (1.019) Loss inf (inf)
Epoch: [275][31/37] Time 0.850 (1.014) Loss inf (inf)
Epoch: [275][32/37] Time 0.901 (1.010) Loss inf (inf)
Epoch: [275][33/37] Time 0.832 (1.005) Loss inf (inf)
Epoch: [275][34/37] Time 0.839 (1.000) Loss inf (inf)
Epoch: [275][35/37] Time 0.918 (0.998) Loss inf (inf)
Epoch: [275][36/37] Time 0.826 (0.993) Loss inf (inf)
Epoch: [275][37/37] Time 0.839 (0.989) Loss inf (inf)
============================
success rate: 0.44146445295870584
rollback rate: 0.5525755640698169
rollback success rate: 0.16773094934014474
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 5.872141008455078 | oracle_error: 3.632631547700617 | steps: 8.510004257130694 | lengths: 15.75594853073863 | spl: 0.32215776203613483 | success_rate: 0.4384844614729672 | oracle_rate: 0.5696040868454662
In the paper, Table 1 (without data augmentation), the expected result should be
val_seen (NE | SR | OSR | SPL): 3.69 | 0.65 | 0.72 | 0.59 val_unseen(NE | SR | OSR | SPL): 5.36 | 0.48 | 0.61 | 0.37
. However, I obtained val_seen SPL 0.56
, 3% lower and val_unseen SPL 0.32
, 5% lower.
My configurations are posted as follows:
# Name Version Build Channel
python 3.8.2 hcf32534_0
pytorch 1.4.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
numpy 1.18.1 py38h4f9e942_0
networkx 2.4 pypi_0 pypi
torchvision 0.5.0 py38_cu101 pytorch
Can you help?
Even with pytorch 0.4.1
, the performance gap still exists.
R2RBatch loaded with 2349 instructions, using splits: val_unseen
Evaluating on val_seen env ...
Epoch: [90][1/16] Time 1.582 (1.582) Loss inf (inf)
Epoch: [90][2/16] Time 1.595 (1.588) Loss inf (inf)
Epoch: [90][3/16] Time 1.803 (1.660) Loss inf (inf)
Epoch: [90][4/16] Time 1.772 (1.688) Loss inf (inf)
Epoch: [90][5/16] Time 1.628 (1.676) Loss inf (inf)
Epoch: [90][6/16] Time 1.615 (1.666) Loss inf (inf)
Epoch: [90][7/16] Time 1.651 (1.664) Loss inf (inf)
Epoch: [90][8/16] Time 1.746 (1.674) Loss inf (inf)
Epoch: [90][9/16] Time 1.500 (1.655) Loss inf (inf)
Epoch: [90][10/16] Time 1.787 (1.668) Loss inf (inf)
Epoch: [90][11/16] Time 1.587 (1.661) Loss inf (inf)
Epoch: [90][12/16] Time 1.690 (1.663) Loss inf (inf)
Epoch: [90][13/16] Time 1.364 (1.640) Loss inf (inf)
Epoch: [90][14/16] Time 1.530 (1.632) Loss inf (inf)
Epoch: [90][15/16] Time 1.277 (1.608) Loss inf (inf)
Epoch: [90][16/16] Time 1.512 (1.602) Loss inf (inf)
============================
success rate: 0.614103819784525
rollback rate: 0.15572967678746327
rollback success rate: 0.05484818805093046
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 3.85835455597601 | oracle_error: 2.362730659573285 | steps: 6.845098039215686 | lengths: 11.64444015803238 | spl: 0.5553807439701335 | success_rate: 0.6107843137254902 | oracle_rate: 0.6990196078431372
Evaluating on val_unseen env ...
Epoch: [90][1/37] Time 1.260 (1.260) Loss inf (inf)
Epoch: [90][2/37] Time 1.225 (1.242) Loss inf (inf)
Epoch: [90][3/37] Time 1.038 (1.174) Loss inf (inf)
Epoch: [90][4/37] Time 1.194 (1.179) Loss inf (inf)
Epoch: [90][5/37] Time 1.071 (1.157) Loss inf (inf)
Epoch: [90][6/37] Time 0.867 (1.109) Loss inf (inf)
Epoch: [90][7/37] Time 0.891 (1.078) Loss inf (inf)
Epoch: [90][8/37] Time 0.760 (1.038) Loss inf (inf)
Epoch: [90][9/37] Time 0.728 (1.004) Loss inf (inf)
Epoch: [90][10/37] Time 0.806 (0.984) Loss inf (inf)
Epoch: [90][11/37] Time 0.804 (0.968) Loss inf (inf)
Epoch: [90][12/37] Time 0.690 (0.944) Loss inf (inf)
Epoch: [90][13/37] Time 0.710 (0.926) Loss inf (inf)
Epoch: [90][14/37] Time 0.709 (0.911) Loss inf (inf)
Epoch: [90][15/37] Time 0.751 (0.900) Loss inf (inf)
Epoch: [90][16/37] Time 0.642 (0.884) Loss inf (inf)
Epoch: [90][17/37] Time 0.603 (0.868) Loss inf (inf)
Epoch: [90][18/37] Time 0.652 (0.856) Loss inf (inf)
Epoch: [90][19/37] Time 0.596 (0.842) Loss inf (inf)
Epoch: [90][20/37] Time 0.610 (0.830) Loss inf (inf)
Epoch: [90][21/37] Time 0.589 (0.819) Loss inf (inf)
Epoch: [90][22/37] Time 0.591 (0.808) Loss inf (inf)
Epoch: [90][23/37] Time 0.567 (0.798) Loss inf (inf)
Epoch: [90][24/37] Time 0.599 (0.790) Loss inf (inf)
Epoch: [90][25/37] Time 0.557 (0.780) Loss inf (inf)
Epoch: [90][26/37] Time 0.586 (0.773) Loss inf (inf)
Epoch: [90][27/37] Time 0.575 (0.766) Loss inf (inf)
Epoch: [90][28/37] Time 0.555 (0.758) Loss inf (inf)
Epoch: [90][29/37] Time 0.559 (0.751) Loss inf (inf)
Epoch: [90][30/37] Time 0.547 (0.744) Loss inf (inf)
Epoch: [90][31/37] Time 0.568 (0.739) Loss inf (inf)
Epoch: [90][32/37] Time 0.604 (0.734) Loss inf (inf)
Epoch: [90][33/37] Time 0.551 (0.729) Loss inf (inf)
Epoch: [90][34/37] Time 0.560 (0.724) Loss inf (inf)
Epoch: [90][35/37] Time 0.619 (0.721) Loss inf (inf)
Epoch: [90][36/37] Time 0.552 (0.716) Loss inf (inf)
Epoch: [90][37/37] Time 0.553 (0.712) Loss inf (inf)
============================
success rate: 0.45977011494252873
rollback rate: 0.42358450404427417
rollback success rate: 0.13452532992762878
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 5.879764965324538 | oracle_error: 3.5521622260626846 | steps: 7.909748829289059 | lengths: 14.349212914918965 | spl: 0.3555266413501602 | success_rate: 0.4559386973180077 | oracle_rate: 0.5751383567475522
# Name Version Build Channel
python 3.7.7 hcf32534_0_cpython
pytorch 0.4.1 py37_cuda9.2.148_cudnn7.1.4_1 [cuda92] pytorch
numpy 1.15.4 py37h7e9f1db_0
networkx 2.4 pypi_0 pypi
torchvision 0.2.1 py37_0
More information about training and testing script:
#!/bin/sh
CUDA_VISIBLE_DEVICES=1 python tasks/R2R-pano/main.py \
--exp_name 'regretful-agent-data|real' \
--batch_size 64 \
--img_fc_dim 1024 \
--rnn_hidden_size 512 \
--eval_every_epochs 5 \
--arch 'regretful' \
--progress_marker 1
#!/bin/sh
CUDA_VISIBLE_DEVICES=1 python tasks/R2R-pano/main.py \
--exp_name 'regretful-agent-data|real' \
--batch_size 64 \
--img_fc_dim 1024 \
--rnn_hidden_size 512 \
--eval_every_epochs 5 \
--arch 'regretful' \
--progress_marker 1 \
--eval_only 1 \
--resume 'best'
val_seen
gap 3.4% lower, val_unseen
gap 1.4% lower.
I do not have a clue of what's happening here....
Excuse me, I want to ask you a question. What are the different meanings of 'success rate' and 'success_rate' in the result? I hope to get your reply.