cugraph
cugraph copied to clipboard
[FEA] Use category dtype in PropertyGraph DataFrames
trafficstars
Describe the solution you'd like and any additional context
We should use category dtype for _TYPE_ column rather than string for _edge_prop_dataframe and _vertex_prop_dataframe.
This will have following benefits:
- Saves on memory Consumption (See below example for motivation)
- Faster because of numerical dtypes
- Also aligns us to Graph Frameworks like
DGLwho also use category like dtypes to save_TYPE_under the hood.
Memory Usage Motivational example:
edge_prop_dataframe
_SRC_ | _DST_ | _EDGE_ID_ | _TYPE_ | user_id_1 | user_id_2 | merchant_id | stars | relationship_type | user_id | volume | time | card_num
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
89216 | 78634 | 0 | referrals | 89216 | 78634 | 11 | 5 | <NA> | <NA> | <NA> | <NA> | <NA>
89216 | 78634 | 0 | referrals | 89216 | 78634 | 11 | 5 | <NA> | <NA> | <NA> | <NA> | <NA>
89216 | 78634 | 0 | referrals | 89216 | 78634 | 11 | 5 | <NA> | <NA> | <NA> | <NA> | <NA>
get_size(edge_prop_dataframe['_TYPE_'].memory_usage())
'3.0 GB'
get_size(edge_prop_dataframe['_TYPE_'].astype('category').memory_usage())
'1.17 GB'
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.