pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: support pd.NA in "category" dtype

Open devmcp opened this issue 3 years ago • 4 comments

Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I would like to be able to use pd.NA for missing data in a column of dtype "category"

Currently this:

pd.DataFrame({"A": ["one", "two", pd.NA]}).astype("category")

converts the pd.NA to np.NaN.

Feature Description

I think there should be a "category" dtype that supports pd.NA.

Alternative Solutions

I don't think there is a current workaround

Additional Context

No response

devmcp avatar Aug 05 '22 14:08 devmcp

Hi, thanks for your report. We already have this, through string-dtype:

pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="string").astype("category")

phofl avatar Aug 05 '22 14:08 phofl

Related to #29962

mzeitlin11 avatar Aug 05 '22 17:08 mzeitlin11

Thanks for your reply @phofl - I didn't quite appreciate how the category "type" sat on top of the actual type of the data. Thanks for linking @mzeitlin11 - good to see I'm not the only one who didn't find this entirely intuitive at first pass.

devmcp avatar Aug 08 '22 15:08 devmcp

This is also a rather interesting problem for discussion under the scope of https://github.com/pandas-dev/pandas/pull/58988

Going to reopen to track this - having to go the .astype("category") route is pretty inefficient, but the alternative I think brings up some pitfalls:

>>> pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="category")
     A
0  one
1  two
2  NaN

WillAyd avatar Jun 25 '24 20:06 WillAyd