ibis
ibis copied to clipboard
feat: avoid pandas converting float("nan") to NULL in memtable
Is your feature request related to a problem?
In pandas, NaNs are treated as NULL. This means that, because we use pandas to create a dataframe during memtable creation, if a user specifies a float("nan"), they get a NULL. In my opinion, ideal behavior would be that they get a true NAN. Maybe related comment in duckdb.
ibis.memtable({"f": [None, float("-inf"), 3.0, float("inf"), float("nan")]}).f
┏━━━━━━━━━┓
┃ f ┃
┡━━━━━━━━━┩
│ float64 │
├─────────┤
│ NULL │
│ -inf │
│ 3.0 │
│ inf │
│ NULL │
└─────────┘
What is the motivation behind your request?
This came up for me when I wanted to test nan vs null behavior in https://github.com/ibis-project/ibis/issues/11029, but it seems like a basic IO operation we should suport.
Describe the solution you'd like
pyarrow does the conversion right. Could we use that and avoid pandas? It looks like it would be a hassle because we are using pandas.DataFrame(), which is a catchall that accepts many different shapes of data. If we used pyarrow, we have to determine which of the pa.Table.from_dicts, pa.Table.from_lists, etc to use. And even then there might be formats we don't support. Of course, if we don't support them, then we could error, and tell the user they need to do ibis.memtable(pd.DataFrame(your_data)) themselves manually and are responsible for the weirdness of pandas.
What version of ibis are you running?
main
What backend(s) are you using, if any?
duckdb, but I think this should affect all
Code of Conduct
- [x] I agree to follow this project's Code of Conduct