dgl
dgl copied to clipboard
[Graphbolt] Offline script to convert from COO to CSC sampling graph.
🔨Work Item
Such that it can be load directly to CSC sampling graph. IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
Depending work items or issues
I've noticed that there's a CPU version implementation in DGL
Code.
I wonder if we need to follow this code? My understanding is that if our only requirement is to convert a single COO
to CSC
, this might be sufficient.
could we offer a utility API to convert COO(which could be homogeneous or heterogenous, unsorted) to a sorted CSCSamplingGraph
? could we construct a DGLGraph
first form coo data and convert via dgl.to_homogeneous()
to obtain csc matrix?
The PR: [Graphbolt] Add the preprocess_ondisk_dataset function. implemented a preprocess_ondisk_dataset()
function:
- Receive the
input_config_path
(a YAML file), extract the data from thegraph
field, and utilize the original data in.csv/.npy
format to create aCSCSamplingGraph
. - Convert all
torch
format data intonumpy
format. - Return the processed YAML file path.