DiGress Repeat that there is a problem with your work

Hello, I have this problem: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1, how do I solve it

Jul 14 '23 04:07 clclclaiggg

Hello, are you using the latest version of the code, with the packages specified by the new requirements.txt? Which branch are you using? Thanks

Jul 14 '23 07:07 cvignac

Hi, I'm using the latest version of the code, but I'm running it on windows. If I run main.py directly, do I need to add any instructions? The error code is as follows. thank you

`Found rdkit, all good Dataset smiles were found. E:\anaconda\envs\digress\lib\site-packages\torch\nn\init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") Marginal distribution of the classes: tensor([0.7230, 0.1151, 0.1593, 0.0026]) for nodes, tensor([0.7261, 0.2384, 0.0274, 0.0081, 0.0000]) for edges GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [clclcl]:54216 (system error: 10049 - ��У��ĵ�ַ��Ч��). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [clclcl]:54216 (system error: 10049 - ��У��ĵ�ַ��Ч��). [2023-07-14 15:41:57,862][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0 [2023-07-14 15:41:57,862][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

You are using a CUDA device ('NVIDIA GeForce RTX 4060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Error executing job with overrides: [] Traceback (most recent call last): File "E:\DiGress-main\src\main.py", line 202, in main trainer.fit(model, datamodule=datamodule, ckpt_path=cfg.general.resume) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 531, in fit call._call_and_handle_interrupt( File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\call.py", line 41, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 91, in launch return function(*args, **kwargs) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 570, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 951, in _run self.strategy.setup(self) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 164, in setup self.configure_ddp() File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 269, in configure_ddp self.model = self._setup_model(_LightningModuleWrapperBase(self.model)) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 183, in _setup_model return DistributedDataParallel(module=model, device_ids=device_ids, **self._ddp_kwargs) File "E:\anaconda\envs\digress\lib\site-packages\torch\nn\parallel\distributed.py", line 657, in init _sync_module_states( File "E:\anaconda\envs\digress\lib\site-packages\torch\distributed\utils.py", line 136, in _sync_module_states _sync_params_and_buffers( File "E:\anaconda\envs\digress\lib\site-packages\torch\distributed\utils.py", line 154, in _sync_params_and_buffers dist._broadcast_coalesced( RuntimeError: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

`

Jul 14 '23 07:07 clclclaiggg

I used to have the same question as you, check if your torch version is under 2.0. The problem occurs when the output dim of y is set to 0, it seems that this is not working with torch version under 2.0.

Oct 13 '24 06:10 hasaki321