ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: avoid pandas converting float("nan") to NULL in memtable

Open NickCrews opened this issue 8 months ago • 0 comments

Is your feature request related to a problem?

In pandas, NaNs are treated as NULL. This means that, because we use pandas to create a dataframe during memtable creation, if a user specifies a float("nan"), they get a NULL. In my opinion, ideal behavior would be that they get a true NAN. Maybe related comment in duckdb.

ibis.memtable({"f": [None, float("-inf"), 3.0, float("inf"), float("nan")]}).f
┏━━━━━━━━━┓
┃ f       ┃
┡━━━━━━━━━┩
│ float64 │
├─────────┤
│    NULL │
│    -inf │
│     3.0 │
│     inf │
│    NULL │
└─────────┘

What is the motivation behind your request?

This came up for me when I wanted to test nan vs null behavior in https://github.com/ibis-project/ibis/issues/11029, but it seems like a basic IO operation we should suport.

Describe the solution you'd like

pyarrow does the conversion right. Could we use that and avoid pandas? It looks like it would be a hassle because we are using pandas.DataFrame(), which is a catchall that accepts many different shapes of data. If we used pyarrow, we have to determine which of the pa.Table.from_dicts, pa.Table.from_lists, etc to use. And even then there might be formats we don't support. Of course, if we don't support them, then we could error, and tell the user they need to do ibis.memtable(pd.DataFrame(your_data)) themselves manually and are responsible for the weirdness of pandas.

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb, but I think this should affect all

Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

NickCrews avatar Apr 03 '25 17:04 NickCrews