pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

Add auto_coerce option to handle mixed-type columns during Arrow conversion

Open lmeyerov opened this issue 1 month ago • 1 comments

Summary

When converting DataFrames to Arrow format for plotting, PyArrow fails on columns with mixed types (e.g., bytes and floats in the same column). This commonly occurs when columns have NaN values (float) mixed with other types like bytes/strings, resulting in cryptic ArrowTypeError messages.

Problem

Users encounter errors like:

pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'float' object", 'Conversion failed for column amount with type object')

The error occurs in PlotterBase._table_to_arrow() when calling pa.Table.from_pandas(). The error message doesn't clearly indicate which column is problematic or how to fix it.

Current Workaround

Users must manually debug by testing each column:

for c in df.columns:
    print('trying col', c)
    pa.Table.from_pandas(df[[c]])

Then fix problematic columns with:

df[c_bad] = df[c_bad].astype(str)

Proposed Solution

Add a strict=False / auto_coerce=True parameter that:

  1. Catches mixed-type conversion errors - Wrap the Arrow conversion in try/except
  2. Auto-coerces problematic columns - Convert mixed-type columns to string as a fallback
  3. Emits warnings - Log which columns were coerced so users are aware
  4. Provides better error messages - When strict mode fails, indicate which column(s) failed and suggest fixes

API Options

Option A - Parameter on plot():

g.plot(df, auto_coerce=True)  # default could be True for convenience

Option B - Global setting:

graphistry.settings(auto_coerce_types=True)

Option C - Both (parameter overrides global)

Impact

This is a common pain point for users working with "dirty" real-world data. Graphistry already handles many dirty data cases (detecting/synthesizing missing nodes, guessing time columns), so auto-coercing mixed-type columns would be consistent with that philosophy.

Environment

  • Affects: PlotterBase._table_to_arrow() (line ~2139)
  • Related to pandas DataFrame → PyArrow Table conversion
  • Common trigger: columns with NaN (float) mixed with bytes/strings

Related Discussion

From Slack discussion with @rjurney - varying/mixed data types are a frequent source of Graphistry errors for users.

lmeyerov avatar Dec 10 '25 07:12 lmeyerov

Likely other calls that trigger upload, like upload()

Unclear how deep / where to thread this

Also, other ways graphs can be dirty ... Should this scope to just pd--> arrow, and rest is server's problem? And later, we can add a strict=true/false to server too.

Probably expose this helper , like g.to_arrow()

lmeyerov avatar Dec 10 '25 14:12 lmeyerov