RelationNetworks-CLEVR
RelationNetworks-CLEVR copied to clipboard
logfile is not showing any runs for the test set. The plots also don't show anything for test set and accuracy.
When I run the code, I get the following output:
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python
Python 3.6.6 (default, Jun 28 2018, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> exit()
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ pyton -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
No command 'pyton' found, did you mean:
Command 'python' from package 'python-minimal' (main)
Command 'pytone' from package 'pytone' (universe)
pyton: command not found
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
TRAIN: 0%| | 0/350 [00:00<?, ?it/sL
oaded hyperparameters from configuration config.json, model: original-fp: {'state_description': False, 'g_layers': [256, 256, 256, 256], 'question_injection_position': 0, 'f_fc1': 256, 'f_fc2': 256, 'dropout': 0
.5, 'lstm_hidden': 128, 'lstm_word_emb': 32, 'rl_in_size': 52}
Building word dictionaries from all the words in the dataset...
==> using cached dictionaries: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!
Initializing CLEVR dataset...
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_val_questions.pkl
CLEVR dataset initialized!
Supposing original DeepMind model
Training (350 epochs) is starting...
Dataset reinitialized with batch size 640
Current learning rate: 1e-05
T
raceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1093/1094 [11:21:28<00:37, 37.41s/it, loss=1.92]
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 418, in <module>
main(args)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 356, in main
train(clevr_train_loader, model, optimizer, epoch, args)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 40, in train
output = model(img, qst)
File "/data/Rudra/virtualenvs/rn_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/data/Rudra/RelationNetworks-CLEVR/model.py", line 200, in forward
x = torch.cat([x, self.coord_tensor], 1) # (B x 24+2 x 8*8)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 469 and 640 in dimension 0 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897
Train Epoch: 1 [0/700160 (0%)] Train loss: 39.945804595947266
Train Epoch: 1 [6400/700160 (1%)] Train loss: 36.57775611877442
Train Epoch: 1 [12800/700160 (2%)] Train loss: 29.848896408081053
Train Epoch: 1 [19200/700160 (3%)] Train loss: 24.984291648864748
Train Epoch: 1 [25600/700160 (4%)] Train loss: 20.945134353637695
.
.
.
Train Epoch: 1 [684800/700160 (98%)] Train loss: 1.8508247494697572
Train Epoch: 1 [691200/700160 (99%)] Train loss: 1.8768051743507386
Train Epoch: 1 [697600/700160 (100%)] Train loss: 1.8581566572189332
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$
I have also attached my logfile with this. When I run the plot function, I get empty plots for everything apart from training loss. Please let me know where the issue might be. Thanks.
Hi @saharudra, this issue is probably due to a batch handling issue on the Multi GPU setup. You should be able to run the code by simply removing the condition (the entire line): https://github.com/mesnico/RelationNetworks-CLEVR/blob/b8e0e7af12408877c8a18d8f2802d88138605983/model.py#L196 This is not the most efficient solution; however, if that is the problem, I will fix it permanently as soon as possible using a better approach. Thanks!
Hi @mesnico, I will give this a try and let you know the outcome here. Thanks!