polars icon indicating copy to clipboard operation
polars copied to clipboard

Raise an error when constructing a Series or DataFrame with mixed types (e.g. string + number)

Open Wainberg opened this issue 2 years ago • 5 comments

Description

I recently found a bug in my own code where I constructed a DataFrame with a mix of integers and strings, and the integers got set to null. Here's a simple illustration:

>>> pl.Series([1, '2'])
shape: (2,)
Series: '' [str]
[
        null
        "2"
]
>>> pl.DataFrame([1, '2'])
 column_0
 null
 2
shape: (2, 1)

Three other options here are to 1) convert everything to dtype=object (pandas's solution, but highly inefficient), 2) automatically upcast everything to a string, and 3) raise an error. I'm a big fan of raising an error here and letting the user decide whether they want to convert the integers to strings, set them to null, or take some other action.

One of the beautiful things about polars is that it makes it much harder to accidentally introduce missing values than pandas, where pretty much every operation does an implicit outer join! Avoiding implicit conversions to null during Series/DataFrame construction would further reduce the potential for missing value-related bugs.

Edit: this also happens here:

>>> pl.Series([1, 2, 3], dtype=pl.String)
shape: (3,)
Series: '' [str]
[
        null
        null
        null
]

pandas converts to string in this situation:

>>> pd.Series([1, 2, 3], dtype=str)[0]
'1'

Wainberg avatar Sep 16 '23 21:09 Wainberg

I think this is a very similar issue to this: https://github.com/pola-rs/polars/issues/11009.

We really should do a pass on the Python -> Polars parsing to make it more restrictive by default, instead of silently casting/nulling/truncating values.

orlp avatar Sep 18 '23 12:09 orlp

@stinodego thoughts on polars's behavior of auto-converting pl.Series([1, '2']) to pl.Series([None, '2'])? I'd argue this should be an error.

Wainberg avatar Dec 31 '23 18:12 Wainberg

It should either raise or cast to string, not sure which.

stinodego avatar Jan 08 '24 23:01 stinodego

@stinodego I would be in favour of raising an error.

orlp avatar Jan 09 '24 09:01 orlp

I'm also in favor of raising an error. If the developers are in agreement, could you accept this issue?

Wainberg avatar Jan 09 '24 21:01 Wainberg

Closing in favor of https://github.com/pola-rs/polars/issues/14427

stinodego avatar Feb 12 '24 21:02 stinodego