arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python] Implement conversion between integer coded as floating points with NaN to an Arrow integer type

Open asfimport opened this issue 8 years ago • 10 comments

For example: if pandas has casted integer data to float, this would enable the integer data to be recovered (so long as the values fall in the ~2^53 floating point range for exact integer representation)

Reporter: Wes McKinney / @wesm

Note: This issue was originally created as ARROW-488. Please see the migration documentation for further details.

asfimport avatar Jan 17 '17 01:01 asfimport

Miki Tebeka / @tebeka: Is the dtype still integer? I see that Pandas changes the dtype once you add a nan:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: s = pd.Series([1,2,3])

In [4]: s
Out[4]: 
0    1
1    2
2    3
dtype: int64

In [5]: s[1] = np.nan

In [6]: s
Out[6]: 
0    1.0
1    NaN
2    3.0
dtype: float64

asfimport avatar Mar 09 '17 15:03 asfimport

Wes McKinney / @wesm: @tebeka the pandas behavior is the motivation for this JIRA

Because pandas implicitly converts from integer to float when introducing null values, the task in this JIRA is to convert (safely) from floating point with NaNs to Arrow integer types with proper nulls

In [2]: pyarrow.Array.from_list([1, 2, None, 4, None])
Out[2]: 
<pyarrow.array.Int64Array object at 0x7fb24fe97bd8>
[
  1,
  2,
  NA,
  4,
  NA
]

asfimport avatar Mar 09 '17 15:03 asfimport

Wes McKinney / @wesm: After ARROW-618, this functionality should be more easily achievable through syntax like

Array.from_pandas(float_data, type=int64())

This would raise an exception on any values that are not safe to case (absolute value exceeding 2^53)

asfimport avatar Mar 16 '17 22:03 asfimport

Wes McKinney / @wesm: This seems like it could simply be a casting option for floating point to integer conversions

asfimport avatar Sep 08 '17 02:09 asfimport

Antoine Pitrou / @pitrou: Is this the same as ARROW-2135, or am I missing something here?

asfimport avatar Mar 01 '18 18:03 asfimport

Wes McKinney / @wesm: As currently scoped, yes. This functionality is not available in arrow::compute::Cast though, so perhaps we can repurpose this JIRA to add this functionality, which may be a bit more complicated (since Cast is not yet able to deal with any null sentinels at all)

asfimport avatar Mar 01 '18 23:03 asfimport

Wes McKinney / @wesm: It would be good to have an explicit cast option for this, like arr.cast(int64(), nan_as_null=True). The safe=False/True option does not provide enough control

asfimport avatar Jul 09 '18 18:07 asfimport

Wes McKinney / @wesm: Circling back on this some time later. I think it would be better to implement this as a separate function (whenever someone needs it) instead of adding complexity to Cast

asfimport avatar Mar 14 '20 22:03 asfimport

Wes McKinney / @wesm: This could be implemented as a standalone function in the new kernels framework

asfimport avatar May 25 '20 14:05 asfimport

This issue hasn't had activity in a long time. If it's still being worked on, please leave a comment. Otherwise, it will be closed on 23rd June.

Labelled Status: Stale-Warning for tracking.

thisisnic avatar Jun 21 '25 08:06 thisisnic