woodwork icon indicating copy to clipboard operation
woodwork copied to clipboard

Use string[arrow] dtype for all Logical Types that use string dtype

Open gsheni opened this issue 3 years ago • 4 comments

  • pandas 1.3 has a new string[arrow] dtype that saves on memory and improves speed
    • https://pythonspeed.com/articles/pandas-string-dtype-memory/
  • We should use it, and verify that all subsequent calls on this work (Featuretools, EvalML).

gsheni avatar Feb 28 '22 15:02 gsheni

Reopening this issue. We are blocked until pandas does a release, which should resolve this issue:

  • https://github.com/alteryx/woodwork/issues/1390

We are also blocked until we determine why PySpark does not work with pandas>=1.4

  • https://github.com/alteryx/woodwork/issues/1423

gsheni avatar May 11 '22 16:05 gsheni

pandas 1.4.3 was just released which resolved the bug in #1390

gsheni avatar Jun 23 '22 15:06 gsheni

@gsheni As far as I can tell, 1.4.3 hasn't resolved this bug yet and the fix is probably going to be part of the next release. Can you confirm that you're not seeing it anymore?

ParthivNaresh avatar Jul 05 '22 13:07 ParthivNaresh

Looks like with pandas==1.5.0, this case passes! Would be worthwhile to look into this issue again and see if we can get this string[pyarrow] type back into WW

bchen1116 avatar Jan 13 '23 21:01 bchen1116