dgl
dgl copied to clipboard
Reduce the peak memory in DistDGL
🔨Work Item
IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
We need to reduce the peak memory in DistDGL. Currently, it uses too much unnecessary memory.
- avoid checking the data correctness when loading the graph structure: https://github.com/dmlc/dgl/blob/master/python/dgl/distributed/partition.py#L100-L119 We can enable the check in the debug mode.
- when init_data is called, it copies data to shared memory. after calling init_data, we should dereference to the tensor and call garbage collection to force python to free memory.
- the DGLGraph object has some unnecessary node/edge data. One example is the node data and edge data of 'orig_id'.
- The DGLGraph object has some fields, such as node type and edge type, which uses too many bytes. For example, we can potentially use 2 bytes to store node type and edge type.
Depending work items or issues
Priority: to be confirmed with Da.
Hi Da, as we discussed, it is better to itemize the work for us to have better understanding of the priority. For now, I am assign this back to you and lower to Medium. Feel free to file new issues, and change the priority back to High.
i have itemized the work. i think we only need to fix the 4 items i listed above.
Hi Rui, please re-estimate the workload based on the update.
re-estimated and PR is sent for review...