Consider namespacing _dbscan column to avoid user data conflicts

Open lmeyerov opened this issue 2 months ago • 0 comments

Priority

Low-priority enhancement

Background

PR #776 fixed identifier conflicts where GFQL operations failed when user graphs had columns named index. However, we deliberately did NOT fix the _dbscan column conflict.

Current Behavior

When users call .dbscan() clustering operations, GFQL silently overwrites any pre-existing _dbscan column in the user's graph.

Decision Rationale

_dbscan is currently part of the user-facing output API
Changing it to __gfql_dbscan__ would be a breaking change
Users should not have pre-existing _dbscan columns when calling .dbscan()

Test Reference

See test_user_dbscan_column_preserved_after_cluster in graphistry/tests/compute/test_identifier_conflicts.py

This test is intentionally skipped and documents our decision NOT to fix this conflict.

Locations

graphistry/compute/cluster.py:182,185,418,420

Potential Future Work

If we decide to namespace this in a future major version:

Change to __gfql_dbscan__ (with auto-increment collision avoidance)
Document as breaking change in migration guide
Provide deprecation warning in intermediate release

Oct 13 '25 00:10 lmeyerov