mmf
mmf copied to clipboard
errors while running krisp code
❓ Questions and Help
Hi !
I am running KRISP project code in mmf. But I discovered some errors.
- torch-sparse module is missed in requirements.txt of krisp project.
- when I installed torch-sparse suitable for cuda version 10.2, I got the error below
2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option config to ./projects/krisp/configs/krisp/okvqa/train_val.yaml
2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option run_type to train_val
2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option datasets to okvqa
2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option model to krisp
2021-06-22T16:32:14 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:14 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:12572
2021-06-22T16:32:14 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:14 | mmf.utils.distributed: Distributed Init (Rank 4): tcp://localhost:12572
2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:12572
2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:12572
2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:12572
2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 2
2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 5): tcp://localhost:12572
2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 5
2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 7): tcp://localhost:12572
2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 7
2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False
2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 6): tcp://localhost:12572
2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 6
2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 3
2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 4
2021-06-22T16:32:16 | root: Added key: store_based_barrier_key:1 to store for rank: 1
2021-06-22T16:32:16 | root: Added key: store_based_barrier_key:1 to store for rank: 0
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 0
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 2
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 5
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 3
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 6
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 7
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 4
2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 1
2021-06-22T16:32:21 | mmf: Logging to: ./save/train.log
2021-06-22T16:32:21 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=./projects/krisp/configs/krisp/okvqa/train_val.yaml', 'run_type=train_val', 'dataset=okvqa', 'model=krisp'])
2021-06-22T16:32:21 | mmf_cli.run: Torch version: 1.8.1+cu102
2021-06-22T16:32:21 | mmf.utils.general: CUDA Device 0 is: GeForce RTX 2080 Ti
2021-06-22T16:32:21 | mmf_cli.run: Using seed 21664516
2021-06-22T16:32:21 | mmf.trainers.mmf_trainer: Loading datasets
okvqa/defaults/annotations/annotations/graph_vocab/graph_vocab.pth.tar
/home/aimaster/.cache/torch/mmf/data
2021-06-22T16:32:27 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2021-06-22T16:32:27 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2021-06-22T16:32:27 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2021-06-22T16:32:27 | mmf.trainers.mmf_trainer: Loading model
Import error with KRISP dependencies. Fix dependencies if you want to use KRISP
Traceback (most recent call last):
File "/home/aimaster/anaconda3/envs/mmf/bin/mmf_run", line 33, in
-- Process 6 terminated with the following error:
Traceback (most recent call last):
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf_cli/run.py", line 66, in distributed_main
main(configuration, init_distributed=True, predict=predict)
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf_cli/run.py", line 52, in main
trainer.load()
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/trainers/mmf_trainer.py", line 42, in load
super().load()
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/trainers/base_trainer.py", line 33, in load
self.load_model()
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/trainers/mmf_trainer.py", line 96, in load_model
self.model = build_model(attributes)
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/utils/build.py", line 87, in build_model
model = model_class(config)
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/models/krisp.py", line 39, in init
self.build()
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/models/krisp.py", line 75, in build
from projects.krisp.graphnetwork_module import GraphNetworkModule
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/projects/krisp/graphnetwork_module.py", line 21, in
Hi @jiny419, thanks for using mmf,
Do you mind sharing the command you use to run? Tagging @KMarino to help with Krisp related issues.
Yes, I didn’t include the pytorch geometric dependencies because they’re system and cuda version dependent. See the installation instructions for pytorch geometric for specific instructions on how to do this on your system.
https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
@ytsheng I runed "mmf_run config=./projects/krisp/configs/krisp/okvqa/train_val.yaml run_type=train_val dataset=okvqa model=krisp" with proper my project path. @KMarino I installed the dependencies like torch-sparse and torch-geometric appropriate to my cuda version, but I faced the error above, especially "torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults" in distributed.py of mmf.
I think the warning function of distributed.py was conflicted with get_layout function of torch-sparse and now I solved it ! Thank you !
@ytsheng I runed "mmf_run config=./projects/krisp/configs/krisp/okvqa/train_val.yaml run_type=train_val dataset=okvqa model=krisp" with proper my project path. @KMarino I installed the dependencies like torch-sparse and torch-geometric appropriate to my cuda version, but I faced the error above, especially "torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults" in distributed.py of mmf.
I think the warning function of distributed.py was conflicted with get_layout function of torch-sparse and now I solved it ! Thank you !
Could you elaborate the solution for the above conflict? Thank you!
@ytsheng I runed "mmf_run config=./projects/krisp/configs/krisp/okvqa/train_val.yaml run_type=train_val dataset=okvqa model=krisp" with proper my project path. @KMarino I installed the dependencies like torch-sparse and torch-geometric appropriate to my cuda version, but I faced the error above, especially "torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults" in distributed.py of mmf. I think the warning function of distributed.py was conflicted with get_layout function of torch-sparse and now I solved it ! Thank you !
Could you elaborate the solution for the above conflict? Thank you!
the warning function of distributed.py was conflicted with get_layout function of torch-sparsw, just comment the warning function