pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: fillna('') does not replace NaT

Open jreback opened this issue 9 years ago • 9 comments

pandas generally tries to coerce values to fit the column dtype, or upcasts the dtype to fit.

For a setting operation this is convenient & I think expected as a user

In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : ['foo'], 'D' : [1]})

In [36]: df
Out[36]:
    A    B    C  D
0 NaT  NaN  foo  1

In [37]: df.dtypes
Out[37]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [38]: df.loc[0,'D'] = 1.0

In [39]: df.dtypes
Out[39]:
A    datetime64[ns]
B            object
C            object
D           float64
dtype: object

However for a .fillna (or .replace) operation this might be a bit unexpected. So A was coerced to object dtype, even though it was datetime64[ns].

In [40]: df.fillna('')
Out[40]:
  A B    C  D
0      foo  1

In [41]: df.fillna('').dtypes
Out[41]:
A     object
B     object
C     object
D    float64
dtype: object

So a possibility is to add a keyword errors='raise'|'coerce'|'ignore'. This last behavior would be equiv of errors='coerce'. While skipping this column would be done with errors='coerce'. (and of course raise would raise.

Ideally would have a default of coerce I think (to skip for non-compat values). Any thoughts on this?

jreback avatar Jan 04 '16 14:01 jreback

cc @ywang007

jreback avatar Jan 04 '16 14:01 jreback

xref. #15533

@jreback I think this keyword would be a :+1:. This would be a way of harmonizing the for/against validating forcefully/weakly that are under discussion at PR#15587. Once that PR is added, this behavior could presumably be added as a single if errors == 'raise': validate_fill_value(obj, value) call.

I think it's worth considering adding similar behavior to methods implementing fill_value. I'm not sure I like that idea, it feels like a lot of API overhead, but, worth considering.

ResidentMario avatar Mar 08 '17 22:03 ResidentMario

This behavior no longer coerces to object. I supposed it could use a test orthoganal to the enhancement request

In [34]: In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : [
    ...: 'foo'], 'D' : [1]})

In [35]: In [38]: df.loc[0,'D'] = 1.0

In [36]: df.dtypes
Out[36]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [37]: In [40]: df.fillna('')
Out[37]:
    A B    C  D
0 NaT    foo  1

In [38]: In [41]: df.fillna('').dtypes
Out[38]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [39]: pd.__version__
Out[39]: '1.3.0.dev0+1383.g855696cde0'

mroeschke avatar Apr 21 '21 05:04 mroeschke

Actually I think this is a bug and the original behavior was correct. NaT is a "na value" that wasn't replaced by empty string

In [1]: df = DataFrame({'A': Series(dtype='M8[ns]'), 'B': Series([np.nan], dtype='object'), 'C': ['foo'], 'D': [1]})

In [2]: df.fillna('')
Out[2]:
    A B    C  D
0 NaT    foo  1

In [3]: df.fillna('').dtypes
Out[3]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [4]: df.fillna(2).dtypes
Out[4]:
A     int64
B     int64
C    object
D     int64
dtype: object

In [5]: df.fillna(2)
Out[5]:
   A  B    C  D
0  2  2  foo  1

mroeschke avatar May 12 '21 03:05 mroeschke

Hello, just to add to this thread. I have encountered this bug when upgrading pandas from 1.2.5 to 1.3.3 (it looks like this bug was introduced in version 1.3.0).

When using fillna or replace on a datetime series, converting to empty string "" will not work. However, when using another string e.g. "hello" it will work, and coerce the series to object type. Also interestingly, df.replace({pd.NaT: ""}) has different behaviour to df.replace(pd.NaT, "")

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [pd.NaT]})

In [3]: df.fillna("")
Out[3]:
    A
0 NaT

In [4]: df.fillna("hello")
Out[4]:
       A
0  hello

In [5]: df.replace(pd.NaT, "")
Out[5]:
    A
0 NaT

In [6]: df.replace(pd.NaT, "hello")
Out[6]:
       A
0  hello

In [7]: df.replace({pd.NaT, ""})
Out[7]:
    A
0 NaT

In [8]: df.replace({pd.NaT, "hello"})
Out[8]:
    A
0 NaT

eirkkr avatar Sep 21 '21 06:09 eirkkr

Also reproduced on 1.3.4

AvivAvital2 avatar Oct 26 '21 08:10 AvivAvital2

same here on latest 1.4.2, pd.fillna('') doesn't work with NaT (pd.isnull() gives True though)

pd.fillna('something') works...

Very surpising it has been here since 2016 ?

yeyeric avatar May 02 '22 18:05 yeyeric

same on version 1.4.3, df = pd.DataFrame({"A": [pd.NaT]}), df.fillna("") will do nothing, df.fillna(" ") will replace NaT with a blank space.

evelynegroen avatar Jul 19 '22 11:07 evelynegroen

same here, NaT still shows if fill na with empty string df.fillna('')

Supertramplee avatar Aug 10 '22 05:08 Supertramplee

The core issue here appears to be specifically because the Timestamp constructor interprets empty string as pd.NaT and therefore the datetime64 type is not upcast to object

In [8]: pd.Timestamp("")
Out[8]: NaT

In [9]: pd.Timestamp(" ")
ValueError: could not convert string to Timestamp

If the behavior of Out[8] was deprecated to not return NaT then this behavior would probably be fixed

mroeschke avatar Aug 10 '22 16:08 mroeschke

This might be the temporary measure 👍

# 1. convert datetime to string
df["target"] = df["target"].dt.strftime('%Y-%m-%d %H:%M:%S')

# 2. fillna
replace_datetime_in_str = "2023-01-01 00:00:00"
df["target"] = df["target"].fillna(replace_dt)

# 3. convert string to datetime
df["target"] = pd.to_datetime(df["target"])

Masumi-M avatar Dec 08 '22 22:12 Masumi-M

I'm a novice, but it seems to still be present in 2.0.1

ciscorucinski avatar May 27 '23 16:05 ciscorucinski

I'm a novice, but it seems to still be present in 2.0.1

still present

msingh0101 avatar Oct 07 '23 00:10 msingh0101

There is also a bug when replacing with the string "NAN" :

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [pd.NaT]})

In [3]: df.fillna("")
Out[3]: 
    A
0 NaT

In [4]: df.fillna("hello")
Out[4]: 
       A
0  hello

In [5]: df.fillna("NAN")
Out[5]: 
    A
0 NaT

In [6]: df.fillna("NAN_")
Out[6]: 
      A
0  NAN_

baptiste-pasquier avatar Oct 25 '23 16:10 baptiste-pasquier