cugraph
cugraph copied to clipboard
Use category dtype for type in PropertyGraph
Closes #2399
This is ~not yet~ now implemented for MG.
Codecov Report
Base: 59.94% // Head: 59.45% // Decreases project coverage by -0.49%
:warning:
Coverage data is based on head (
a9b6681
) compared to base (94bc5ba
). Patch coverage: 43.58% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## branch-22.10 #2510 +/- ##
================================================
- Coverage 59.94% 59.45% -0.50%
================================================
Files 111 111
Lines 6219 6279 +60
================================================
+ Hits 3728 3733 +5
- Misses 2491 2546 +55
Impacted Files | Coverage Δ | |
---|---|---|
...ugraph/cugraph/dask/structure/mg_property_graph.py | 13.54% <6.38%> (-0.40%) |
:arrow_down: |
python/cugraph/cugraph/structure/property_graph.py | 97.21% <100.00%> (+0.16%) |
:arrow_up: |
...h/cugraph/gnn/pyg_extensions/data/cugraph_store.py | 72.72% <0.00%> (-8.09%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
We should hold off on MG until the two cudf
issues above are fixed. I can push MG changes to a different branch while we wait.
This PR also opens up an object model possibility. Currently, we can get the type names of a property graph via pg.vertex_types
and pg.edge_types
. These are inferred from the data, so there must be at least one vertex or edge with each type. What if the user knows upfront what most or all of the types will be, and wants to set these values before the data is loaded? This information would let us be more efficient with categorical dtypes for the type column, because there would be less of a need to add categories to the categorical dtype as new data comes in. So, should we allow users to add types to the property graph? This allows types to exist that have no data yet. I'm inclined to say "not yet".
So, should we allow users to add types to the property graph? This allows types to exist that have no data yet. I'm inclined to say "not yet
I think "not yet" is the correct assumption. I think the main focus is to make the PGs efficient (both memory and computation) once the graph is created and not optimize the graph creation process.
rerun tests
👍
rerun tests
rerun tests
Using categoricals for the type changes the behavior of renumber_edges_by_type
and renumber_vertices_by_type
. Previously, we would sort using on the name of the type. Now, we sort on the categorical value of the type, which is not alphabetical.
This probably isn't a problem, since we give the start and end for each type, and otherwise the behavior is well-behaved and correct.
rerun tests
rerun tests
rerun tests
@gpucibot merge