torchdrug
torchdrug copied to clipboard
loading AlphaFoldDB encountering cgroup out-of-memory handler
Hi,
I am trying to use AlphaFoldDB as my training dataset, but it will always have the cgroup oom problem during the data loading stage. I have tried to use the lazy mode when processing the dataloader, but it will still face the same cgroup oom issue.
I am also using some transforms for processing the dataset, like truncating proteins to a maximum length of 100.
I paste a screenshot along with my configuration for loading the dataset below.


Any advice you can give me will be helpful!
Hi, this issue is better raised in the GearNet repo. Typically, we only load one split (with at most 22,000 proteins) and use the round-robin method for pre-training. So will this sovle your problem?
Thank you for ur quick response. However, I am using the same code script as GearNet, and I think it performs the round-robin scheduling as it loads one species at a time. However, this will also face the cgroup oom problem. I attach the code to this message.

Thanks. I guess that maybe it is still too large for a split to fit in your memory. In this case, you may need a machine with large cpu memory. Otherwise, you have to hack the source code of AlphaFold DB. Here are some possible solutions:
- change the size of split to a smaller number
- remove all edges generated by rdkit. This works only because we don't need bond edges in GearNet and will re-generate edges on GPU.
- do truncation in
load_pickle()
when loading protein
Great! Thank you for ur help, I will try to hack the issue with your advice. I will follow up on the effectiveness of these suggestions once I implement them successfully. Many thanks again!