DiGress
DiGress copied to clipboard
Repeat that there is a problem with your work
Hello, I have this problem: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1, how do I solve it
Hello, are you using the latest version of the code, with the packages specified by the new requirements.txt? Which branch are you using? Thanks
Hi, I'm using the latest version of the code, but I'm running it on windows. If I run main.py directly, do I need to add any instructions? The error code is as follows. thank you
`Found rdkit, all good Dataset smiles were found. E:\anaconda\envs\digress\lib\site-packages\torch\nn\init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") Marginal distribution of the classes: tensor([0.7230, 0.1151, 0.1593, 0.0026]) for nodes, tensor([0.7261, 0.2384, 0.0274, 0.0081, 0.0000]) for edges GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [clclcl]:54216 (system error: 10049 - �����������У�������ĵ�ַ��Ч��). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [clclcl]:54216 (system error: 10049 - �����������У�������ĵ�ַ��Ч��). [2023-07-14 15:41:57,862][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0 [2023-07-14 15:41:57,862][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
distributed_backend=nccl All distributed processes registered. Starting with 1 processes
You are using a CUDA device ('NVIDIA GeForce RTX 4060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high')
which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Error executing job with overrides: []
Traceback (most recent call last):
File "E:\DiGress-main\src\main.py", line 202, in main
trainer.fit(model, datamodule=datamodule, ckpt_path=cfg.general.resume)
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 531, in fit
call._call_and_handle_interrupt(
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\call.py", line 41, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 91, in launch
return function(*args, **kwargs)
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 570, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 951, in _run
self.strategy.setup(self)
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 164, in setup
self.configure_ddp()
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 269, in configure_ddp
self.model = self._setup_model(_LightningModuleWrapperBase(self.model))
File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 183, in _setup_model
return DistributedDataParallel(module=model, device_ids=device_ids, **self._ddp_kwargs)
File "E:\anaconda\envs\digress\lib\site-packages\torch\nn\parallel\distributed.py", line 657, in init
_sync_module_states(
File "E:\anaconda\envs\digress\lib\site-packages\torch\distributed\utils.py", line 136, in _sync_module_states
_sync_params_and_buffers(
File "E:\anaconda\envs\digress\lib\site-packages\torch\distributed\utils.py", line 154, in _sync_params_and_buffers
dist._broadcast_coalesced(
RuntimeError: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
`
I used to have the same question as you, check if your torch version is under 2.0. The problem occurs when the output dim of y is set to 0, it seems that this is not working with torch version under 2.0.