woodwork icon indicating copy to clipboard operation
woodwork copied to clipboard

Stop converting ['1', '2', '3'] to string dtype before initializing as a Double column

Open tamargrey opened this issue 3 years ago • 4 comments

Pandas bug was stopping us from being able to initialize Woodwork on a column of numeric strings, so we had to convert to the string dtype first. https://github.com/alteryx/woodwork/pull/755/files#diff-75cb54847db4ed09b32148b93f1f543df16a421473154cb17085ff4932277ba8L402

This should be removed after the bug is fixed.

  • https://github.com/pandas-dev/pandas/issues/40729

tamargrey avatar Apr 01 '21 20:04 tamargrey

The original pandas issue (https://github.com/pandas-dev/pandas/issues/40729) that caused us to add this change is fixed, though the change mentioned in this issue no longer seems to be present. So this issue may be able to be closed.

tamargrey avatar Aug 15 '22 13:08 tamargrey

pandas 1.4.3

  • I still get the same error

pandas main (Aug 13), pandas-1.5.0.dev0+1285.g60b4400491

import pandas as pd

series = pd.Series(['1', '2', '3', '4'], dtype='object')
series.astype(pd.Float64Dtype())
  • No error
0    1.0
1    2.0
2    3.0
3    4.0
dtype: Float64

gsheni avatar Aug 15 '22 16:08 gsheni

So I believe we blocked on this issue until pandas 1.5.0 comes out

gsheni avatar Aug 15 '22 16:08 gsheni

Correct, I was more saying that I think the workaround I created this issue about may no longer be present (it's no longer in that test on woodwork main), in which case this issue can be closed.

tamargrey avatar Aug 16 '22 14:08 tamargrey