woodwork
woodwork copied to clipboard
Use string[arrow] dtype for all Logical Types that use string dtype
- pandas 1.3 has a new string[arrow] dtype that saves on memory and improves speed
- https://pythonspeed.com/articles/pandas-string-dtype-memory/
- We should use it, and verify that all subsequent calls on this work (Featuretools, EvalML).
Reopening this issue. We are blocked until pandas does a release, which should resolve this issue:
- https://github.com/alteryx/woodwork/issues/1390
We are also blocked until we determine why PySpark does not work with pandas>=1.4
- https://github.com/alteryx/woodwork/issues/1423
pandas 1.4.3 was just released which resolved the bug in #1390
@gsheni As far as I can tell, 1.4.3
hasn't resolved this bug yet and the fix is probably going to be part of the next release. Can you confirm that you're not seeing it anymore?
Looks like with pandas==1.5.0
, this case passes! Would be worthwhile to look into this issue again and see if we can get this string[pyarrow]
type back into WW