dgl
dgl copied to clipboard
[Graphbolt] Offline script to convert from COO to CSC sampling graph.
🔨Work Item
Such that it can be load directly to CSC sampling graph. IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
Depending work items or issues
I've noticed that there's a CPU version implementation in DGL Code.
I wonder if we need to follow this code? My understanding is that if our only requirement is to convert a single COO to CSC, this might be sufficient.
could we offer a utility API to convert COO(which could be homogeneous or heterogenous, unsorted) to a sorted CSCSamplingGraph? could we construct a DGLGraph first form coo data and convert via dgl.to_homogeneous() to obtain csc matrix?
The PR: [Graphbolt] Add the preprocess_ondisk_dataset function. implemented a preprocess_ondisk_dataset() function:
- Receive the
input_config_path(a YAML file), extract the data from thegraphfield, and utilize the original data in.csv/.npyformat to create aCSCSamplingGraph. - Convert all
torchformat data intonumpyformat. - Return the processed YAML file path.