pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

Consider namespacing _dbscan column to avoid user data conflicts

Open lmeyerov opened this issue 2 months ago • 0 comments

Priority

Low-priority enhancement

Background

PR #776 fixed identifier conflicts where GFQL operations failed when user graphs had columns named index. However, we deliberately did NOT fix the _dbscan column conflict.

Current Behavior

When users call .dbscan() clustering operations, GFQL silently overwrites any pre-existing _dbscan column in the user's graph.

Decision Rationale

  • _dbscan is currently part of the user-facing output API
  • Changing it to __gfql_dbscan__ would be a breaking change
  • Users should not have pre-existing _dbscan columns when calling .dbscan()

Test Reference

See test_user_dbscan_column_preserved_after_cluster in graphistry/tests/compute/test_identifier_conflicts.py

This test is intentionally skipped and documents our decision NOT to fix this conflict.

Locations

  • graphistry/compute/cluster.py:182,185,418,420

Potential Future Work

If we decide to namespace this in a future major version:

  • Change to __gfql_dbscan__ (with auto-increment collision avoidance)
  • Document as breaking change in migration guide
  • Provide deprecation warning in intermediate release

lmeyerov avatar Oct 13 '25 00:10 lmeyerov