pygraphistry
pygraphistry copied to clipboard
Consider namespacing _dbscan column to avoid user data conflicts
Priority
Low-priority enhancement
Background
PR #776 fixed identifier conflicts where GFQL operations failed when user graphs had columns named index. However, we deliberately did NOT fix the _dbscan column conflict.
Current Behavior
When users call .dbscan() clustering operations, GFQL silently overwrites any pre-existing _dbscan column in the user's graph.
Decision Rationale
_dbscanis currently part of the user-facing output API- Changing it to
__gfql_dbscan__would be a breaking change - Users should not have pre-existing
_dbscancolumns when calling.dbscan()
Test Reference
See test_user_dbscan_column_preserved_after_cluster in graphistry/tests/compute/test_identifier_conflicts.py
This test is intentionally skipped and documents our decision NOT to fix this conflict.
Locations
graphistry/compute/cluster.py:182,185,418,420
Potential Future Work
If we decide to namespace this in a future major version:
- Change to
__gfql_dbscan__(with auto-increment collision avoidance) - Document as breaking change in migration guide
- Provide deprecation warning in intermediate release