regretful-agent icon indicating copy to clipboard operation
regretful-agent copied to clipboard

Potential reproducibility issue with PyTorch >1.0.0

Open chihyaoma opened this issue 4 years ago • 2 comments

Hi all,

Thank you so much for your interest in the project and the released code.

We made sure that the code can robustly reproduce the numbers we reported in the paper when released the code, and since then I have confirmed with several people who tried the code and they can also reproduce the results.

However, since the 2nd week in September, I started to receive a few emails reporting that they have an issue in reproducing the results either in the Self-Monitoring agent or the Regretful agent.

I decided to create this issue now so that people who are interested in the proposed method can run the code and continue their research with caution. Currently, I suspect this issue is due to version differences in PyTorch (or even other python/Cuda libraries that I am using) that cause unexpected behavior.

With the current conference deadlines, I expect myself to be able to start investigating this issue as early as the winter break (end of December).


Below are the experimental setups that I used for developing and releasing the code. I hope this would help to reproduce the results.

Code development:
PyTorch 0.4.1 CUDA: 9.2.148 Cudnn: 7104

I also tested it out on the following setting and made sure it can reproduce the results when releasing the code: PyTorch 1.0.0 CUDA: 10.0.130 Cudnn: 7401

chihyaoma avatar Nov 04 '19 05:11 chihyaoma

Hi Chih-Yao,

Thank you for opening this issue. I wrote you an email few hours ago. I will pose my experimental setups and results here for you to debug later.

I am using:

PyTorch: 1.2.0a0+e6a7071 CUDA: 10.1 Cudnn: 7602

with this command: CUDA_VISIBLE_DEVICES=0 python tasks/R2R-pano/main.py
--exp_name 'regretful-agent-data|real'
--batch_size 64
--img_fc_dim 1024
--rnn_hidden_size 512
--eval_every_epochs 5
--arch 'regretful'
--progress_marker 1

And here is the screenshot of my tensorboard: Screenshot 2019-11-04 at 6 56 54 pm 2 Screenshot 2019-11-04 at 6 57 40 pm 2

Looking forward to hearing from you soon.

liuhualin333 avatar Nov 04 '19 10:11 liuhualin333

@chihyaoma @liuhualin333 Have you been able to reproduce the result with pytorch 1.2.0, cuda 10.1? I have the same problem here. My configurations are as follows.

# Name                    Version                   Build  Channel
python                    3.8.2                hcf32534_0
pytorch                   1.4.0           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
numpy                     1.18.1           py38h4f9e942_0
networkx                  2.4                      pypi_0    pypi
torchvision               0.5.0                py38_cu101    pytorch

convnets avatar Apr 05 '20 15:04 convnets