dgl
dgl copied to clipboard
Investigate graphs load time cost and improve
🔨Work Item
IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
DGL graphs load is quarter the speed of PYG againt different graphs, and the gap increase when repeated number of graphs increase, we should investigate why this happens and try to improve. DGL
Graph number | Size in disk(Mb) | Load time(Seconds) |
---|---|---|
100K | 105 | 42 |
1000K | 1047 | 422 |
5000K | 5235 | 2067 |
PYG
Graph number | Size in disk(Mb) | Load time(Seconds) |
---|---|---|
100K | 105 | 9.72 |
1000K | 1057 | 95.53 |
5000K | 5246 | 473.26 |
Depending work items or issues
I remember it's more efficient to save/load a batched graph rather than a list of graphs, though I do not have detailed numbers here.
@peizhou001 to clarify the problem, many graph, or single graph. Let's decide the plan later.