torchdrug icon indicating copy to clipboard operation
torchdrug copied to clipboard

loading AlphaFoldDB encountering cgroup out-of-memory handler

Open hxu105 opened this issue 1 year ago • 4 comments

Hi,

I am trying to use AlphaFoldDB as my training dataset, but it will always have the cgroup oom problem during the data loading stage. I have tried to use the lazy mode when processing the dataloader, but it will still face the same cgroup oom issue.

I am also using some transforms for processing the dataset, like truncating proteins to a maximum length of 100.

I paste a screenshot along with my configuration for loading the dataset below.

image image

Any advice you can give me will be helpful!

hxu105 avatar Mar 20 '23 18:03 hxu105

Hi, this issue is better raised in the GearNet repo. Typically, we only load one split (with at most 22,000 proteins) and use the round-robin method for pre-training. So will this sovle your problem?

Oxer11 avatar Mar 20 '23 18:03 Oxer11

Thank you for ur quick response. However, I am using the same code script as GearNet, and I think it performs the round-robin scheduling as it loads one species at a time. However, this will also face the cgroup oom problem. I attach the code to this message.

image

hxu105 avatar Mar 20 '23 18:03 hxu105

Thanks. I guess that maybe it is still too large for a split to fit in your memory. In this case, you may need a machine with large cpu memory. Otherwise, you have to hack the source code of AlphaFold DB. Here are some possible solutions:

  1. change the size of split to a smaller number
  2. remove all edges generated by rdkit. This works only because we don't need bond edges in GearNet and will re-generate edges on GPU.
  3. do truncation in load_pickle() when loading protein

Oxer11 avatar Mar 20 '23 19:03 Oxer11

Great! Thank you for ur help, I will try to hack the issue with your advice. I will follow up on the effectiveness of these suggestions once I implement them successfully. Many thanks again!

hxu105 avatar Mar 20 '23 19:03 hxu105