regretful-agent icon indicating copy to clipboard operation
regretful-agent copied to clipboard

Performance Gap (significantly lower than reported)

Open convnets opened this issue 4 years ago • 2 comments

Hi,

I have tried to reproduce the reported result. However, my results are lower than the paper claimed. The results are shown below:

Evaluating on val_seen env ...
Epoch: [275][1/16]      Time 1.875 (1.875)      Loss inf (inf)
Epoch: [275][2/16]      Time 1.779 (1.827)      Loss inf (inf)
Epoch: [275][3/16]      Time 1.974 (1.876)      Loss inf (inf)
Epoch: [275][4/16]      Time 1.946 (1.894)      Loss inf (inf)
Epoch: [275][5/16]      Time 1.816 (1.878)      Loss inf (inf)
Epoch: [275][6/16]      Time 1.790 (1.863)      Loss inf (inf)
Epoch: [275][7/16]      Time 1.829 (1.858)      Loss inf (inf)
Epoch: [275][8/16]      Time 1.910 (1.865)      Loss inf (inf)
Epoch: [275][9/16]      Time 1.683 (1.845)      Loss inf (inf)
Epoch: [275][10/16]     Time 1.947 (1.855)      Loss inf (inf)
Epoch: [275][11/16]     Time 1.788 (1.849)      Loss inf (inf)
Epoch: [275][12/16]     Time 1.887 (1.852)      Loss inf (inf)
Epoch: [275][13/16]     Time 1.575 (1.831)      Loss inf (inf)
Epoch: [275][14/16]     Time 1.704 (1.822)      Loss inf (inf)
Epoch: [275][15/16]     Time 1.492 (1.800)      Loss inf (inf)
Epoch: [275][16/16]     Time 1.724 (1.795)      Loss inf (inf)
============================
success rate: 0.6317335945151812
rollback rate: 0.20372184133202742
rollback success rate: 0.07639569049951028
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 3.6218101292139466 | oracle_error: 2.1551905617521285 | steps: 7.056862745098039 | lengths: 12.267354546942984 | spl: 0.5606026181302899 | success_rate: 0.6284313725490196 | oracle_rate: 0.7441176470588236
Evaluating on val_unseen env ...
Epoch: [275][1/37]      Time 1.413 (1.413)      Loss inf (inf)
Epoch: [275][2/37]      Time 1.424 (1.419)      Loss inf (inf)
Epoch: [275][3/37]      Time 1.279 (1.372)      Loss inf (inf)
Epoch: [275][4/37]      Time 1.418 (1.384)      Loss inf (inf)
Epoch: [275][5/37]      Time 1.312 (1.369)      Loss inf (inf)
Epoch: [275][6/37]      Time 1.146 (1.332)      Loss inf (inf)
Epoch: [275][7/37]      Time 1.152 (1.306)      Loss inf (inf)
Epoch: [275][8/37]      Time 1.043 (1.273)      Loss inf (inf)
Epoch: [275][9/37]      Time 1.016 (1.245)      Loss inf (inf)
Epoch: [275][10/37]     Time 1.085 (1.229)      Loss inf (inf)
Epoch: [275][11/37]     Time 1.080 (1.215)      Loss inf (inf)
Epoch: [275][12/37]     Time 0.976 (1.195)      Loss inf (inf)
Epoch: [275][13/37]     Time 0.997 (1.180)      Loss inf (inf)
Epoch: [275][14/37]     Time 1.005 (1.167)      Loss inf (inf)
Epoch: [275][15/37]     Time 1.037 (1.159)      Loss inf (inf)
Epoch: [275][16/37]     Time 0.944 (1.145)      Loss inf (inf)
Epoch: [275][17/37]     Time 0.899 (1.131)      Loss inf (inf)
Epoch: [275][18/37]     Time 0.955 (1.121)      Loss inf (inf)
Epoch: [275][19/37]     Time 0.891 (1.109)      Loss inf (inf)
Epoch: [275][20/37]     Time 0.906 (1.099)      Loss inf (inf)
Epoch: [275][21/37]     Time 0.889 (1.089)      Loss inf (inf)
Epoch: [275][22/37]     Time 0.888 (1.080)      Loss inf (inf)
Epoch: [275][23/37]     Time 0.855 (1.070)      Loss inf (inf)
Epoch: [275][24/37]     Time 0.890 (1.062)      Loss inf (inf)
Epoch: [275][25/37]     Time 0.840 (1.054)      Loss inf (inf)
Epoch: [275][26/37]     Time 0.883 (1.047)      Loss inf (inf)
Epoch: [275][27/37]     Time 0.857 (1.040)      Loss inf (inf)
Epoch: [275][28/37]     Time 0.831 (1.033)      Loss inf (inf)
Epoch: [275][29/37]     Time 0.843 (1.026)      Loss inf (inf)
Epoch: [275][30/37]     Time 0.820 (1.019)      Loss inf (inf)
Epoch: [275][31/37]     Time 0.850 (1.014)      Loss inf (inf)
Epoch: [275][32/37]     Time 0.901 (1.010)      Loss inf (inf)
Epoch: [275][33/37]     Time 0.832 (1.005)      Loss inf (inf)
Epoch: [275][34/37]     Time 0.839 (1.000)      Loss inf (inf)
Epoch: [275][35/37]     Time 0.918 (0.998)      Loss inf (inf)
Epoch: [275][36/37]     Time 0.826 (0.993)      Loss inf (inf)
Epoch: [275][37/37]     Time 0.839 (0.989)      Loss inf (inf)
============================
success rate: 0.44146445295870584
rollback rate: 0.5525755640698169
rollback success rate: 0.16773094934014474
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 5.872141008455078 | oracle_error: 3.632631547700617 | steps: 8.510004257130694 | lengths: 15.75594853073863 | spl: 0.32215776203613483 | success_rate: 0.4384844614729672 | oracle_rate: 0.5696040868454662

In the paper, Table 1 (without data augmentation), the expected result should be val_seen (NE | SR | OSR | SPL): 3.69 | 0.65 | 0.72 | 0.59 val_unseen(NE | SR | OSR | SPL): 5.36 | 0.48 | 0.61 | 0.37. However, I obtained val_seen SPL 0.56, 3% lower and val_unseen SPL 0.32, 5% lower.

My configurations are posted as follows:

# Name                    Version                   Build  Channel
python                    3.8.2                hcf32534_0
pytorch                   1.4.0           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
numpy                     1.18.1           py38h4f9e942_0
networkx                  2.4                      pypi_0    pypi
torchvision               0.5.0                py38_cu101    pytorch

Can you help?

convnets avatar Apr 06 '20 01:04 convnets

Even with pytorch 0.4.1, the performance gap still exists.

R2RBatch loaded with 2349 instructions, using splits: val_unseen
Evaluating on val_seen env ...
Epoch: [90][1/16]       Time 1.582 (1.582)      Loss inf (inf)
Epoch: [90][2/16]       Time 1.595 (1.588)      Loss inf (inf)
Epoch: [90][3/16]       Time 1.803 (1.660)      Loss inf (inf)
Epoch: [90][4/16]       Time 1.772 (1.688)      Loss inf (inf)
Epoch: [90][5/16]       Time 1.628 (1.676)      Loss inf (inf)
Epoch: [90][6/16]       Time 1.615 (1.666)      Loss inf (inf)
Epoch: [90][7/16]       Time 1.651 (1.664)      Loss inf (inf)
Epoch: [90][8/16]       Time 1.746 (1.674)      Loss inf (inf)
Epoch: [90][9/16]       Time 1.500 (1.655)      Loss inf (inf)
Epoch: [90][10/16]      Time 1.787 (1.668)      Loss inf (inf)
Epoch: [90][11/16]      Time 1.587 (1.661)      Loss inf (inf)
Epoch: [90][12/16]      Time 1.690 (1.663)      Loss inf (inf)
Epoch: [90][13/16]      Time 1.364 (1.640)      Loss inf (inf)
Epoch: [90][14/16]      Time 1.530 (1.632)      Loss inf (inf)
Epoch: [90][15/16]      Time 1.277 (1.608)      Loss inf (inf)
Epoch: [90][16/16]      Time 1.512 (1.602)      Loss inf (inf)
============================
success rate: 0.614103819784525
rollback rate: 0.15572967678746327
rollback success rate: 0.05484818805093046
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 3.85835455597601 | oracle_error: 2.362730659573285 | steps: 6.845098039215686 | lengths: 11.64444015803238 | spl: 0.5553807439701335 | success_rate: 0.6107843137254902 | oracle_rate: 0.6990196078431372
Evaluating on val_unseen env ...
Epoch: [90][1/37]       Time 1.260 (1.260)      Loss inf (inf)
Epoch: [90][2/37]       Time 1.225 (1.242)      Loss inf (inf)
Epoch: [90][3/37]       Time 1.038 (1.174)      Loss inf (inf)
Epoch: [90][4/37]       Time 1.194 (1.179)      Loss inf (inf)
Epoch: [90][5/37]       Time 1.071 (1.157)      Loss inf (inf)
Epoch: [90][6/37]       Time 0.867 (1.109)      Loss inf (inf)
Epoch: [90][7/37]       Time 0.891 (1.078)      Loss inf (inf)
Epoch: [90][8/37]       Time 0.760 (1.038)      Loss inf (inf)
Epoch: [90][9/37]       Time 0.728 (1.004)      Loss inf (inf)
Epoch: [90][10/37]      Time 0.806 (0.984)      Loss inf (inf)
Epoch: [90][11/37]      Time 0.804 (0.968)      Loss inf (inf)
Epoch: [90][12/37]      Time 0.690 (0.944)      Loss inf (inf)
Epoch: [90][13/37]      Time 0.710 (0.926)      Loss inf (inf)
Epoch: [90][14/37]      Time 0.709 (0.911)      Loss inf (inf)
Epoch: [90][15/37]      Time 0.751 (0.900)      Loss inf (inf)
Epoch: [90][16/37]      Time 0.642 (0.884)      Loss inf (inf)
Epoch: [90][17/37]      Time 0.603 (0.868)      Loss inf (inf)
Epoch: [90][18/37]      Time 0.652 (0.856)      Loss inf (inf)
Epoch: [90][19/37]      Time 0.596 (0.842)      Loss inf (inf)
Epoch: [90][20/37]      Time 0.610 (0.830)      Loss inf (inf)
Epoch: [90][21/37]      Time 0.589 (0.819)      Loss inf (inf)
Epoch: [90][22/37]      Time 0.591 (0.808)      Loss inf (inf)
Epoch: [90][23/37]      Time 0.567 (0.798)      Loss inf (inf)
Epoch: [90][24/37]      Time 0.599 (0.790)      Loss inf (inf)
Epoch: [90][25/37]      Time 0.557 (0.780)      Loss inf (inf)
Epoch: [90][26/37]      Time 0.586 (0.773)      Loss inf (inf)
Epoch: [90][27/37]      Time 0.575 (0.766)      Loss inf (inf)
Epoch: [90][28/37]      Time 0.555 (0.758)      Loss inf (inf)
Epoch: [90][29/37]      Time 0.559 (0.751)      Loss inf (inf)
Epoch: [90][30/37]      Time 0.547 (0.744)      Loss inf (inf)
Epoch: [90][31/37]      Time 0.568 (0.739)      Loss inf (inf)
Epoch: [90][32/37]      Time 0.604 (0.734)      Loss inf (inf)
Epoch: [90][33/37]      Time 0.551 (0.729)      Loss inf (inf)
Epoch: [90][34/37]      Time 0.560 (0.724)      Loss inf (inf)
Epoch: [90][35/37]      Time 0.619 (0.721)      Loss inf (inf)
Epoch: [90][36/37]      Time 0.552 (0.716)      Loss inf (inf)
Epoch: [90][37/37]      Time 0.553 (0.712)      Loss inf (inf)
============================
success rate: 0.45977011494252873
rollback rate: 0.42358450404427417
rollback success rate: 0.13452532992762878
oscillating rate: 0.0
oscillating success rate: 0.0
============================
| nav_error: 5.879764965324538 | oracle_error: 3.5521622260626846 | steps: 7.909748829289059 | lengths: 14.349212914918965 | spl: 0.3555266413501602 | success_rate: 0.4559386973180077 | oracle_rate: 0.5751383567475522
# Name                    Version                   Build  Channel
python                    3.7.7           hcf32534_0_cpython
pytorch                   0.4.1           py37_cuda9.2.148_cudnn7.1.4_1  [cuda92]  pytorch
numpy                     1.15.4           py37h7e9f1db_0
networkx                  2.4                      pypi_0    pypi
torchvision               0.2.1                    py37_0

More information about training and testing script:

#!/bin/sh
CUDA_VISIBLE_DEVICES=1 python tasks/R2R-pano/main.py \
    --exp_name 'regretful-agent-data|real' \
    --batch_size 64 \
    --img_fc_dim 1024 \
    --rnn_hidden_size 512 \
    --eval_every_epochs 5 \
    --arch 'regretful' \
    --progress_marker 1
#!/bin/sh
CUDA_VISIBLE_DEVICES=1 python tasks/R2R-pano/main.py \
    --exp_name 'regretful-agent-data|real' \
    --batch_size 64 \
    --img_fc_dim 1024 \
    --rnn_hidden_size 512 \
    --eval_every_epochs 5 \
    --arch 'regretful' \
    --progress_marker 1 \
    --eval_only 1 \
    --resume 'best'

val_seen gap 3.4% lower, val_unseen gap 1.4% lower. I do not have a clue of what's happening here....

convnets avatar Apr 11 '20 03:04 convnets

Excuse me, I want to ask you a question. What are the different meanings of 'success rate' and 'success_rate' in the result? I hope to get your reply.

Hannah-hh avatar May 18 '20 15:05 Hannah-hh