GLAM icon indicating copy to clipboard operation
GLAM copied to clipboard

CUDA out of memory

Open StefanIsSmart opened this issue 7 months ago • 4 comments

When I am reproducing your work, I find always CUDA out of memory.

like this:

Traceback (most recent call last): File "run.py", line 62, in trainer.train_and_test() File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 101, in train_and_test self.train() File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 83, in train trn_loss = self.train_iterations() File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 295, in train_iterations output = self.model(mol_batch).view(-1) File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/model.py", line 54, in forward xm, hm = self.mol_conv(xm, data_mol.edge_index, data_mol.edge_attr, h=hm, batch=data_mol.batch) File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 259, in forward x = self.conv(x, edge_index, edge_attr) File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 131, in forward return self.conv(x, edge_index, edge_attr) File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 40, in forward return self.propagate(edge_index, x=x, edge_attr=edge_attr, size=size) File "/tmp/layer_TripletMessage_propagate_6_8b4kbw.py", line 194, in propagate out = self.message( File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 49, in message alpha = (triplet * self.weight_triplet_att).sum(dim=-1) # time consuming 12.14s torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 866.00 MiB (GPU 4; 23.64 GiB total capacity; 10.45 GiB already allocated; 31.56 MiB free; 10.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How to deal with that?

StefanIsSmart avatar Jul 04 '24 14:07 StefanIsSmart