pandas Pyarrow CSV Reader Integration Tracker

Pyarrow CSV Reader Integration Tracker

Open lithomas1 opened this issue 4 years ago • 4 comments

trafficstars

Issue tracking Pyarrow engine integration in read_csv (for after #38370 is merged) Current unsupported options

[ ] skipfooter
[ ] float_precision
[ ] chunksize
[ ] comment
[ ] nrows
[ ] thousands
[ ] memory map
[ ] dialect
[ ] on_bad_lines
[ ] delim_whitespace
[ ] quoting
[ ] lineterminator (this is c-only)
[ ] converters
[ ] decimal
[ ] iterator
[ ] dayfirst
[ ] verbose
[ ] skipinitialspace
[ ] low_memory (this is c-only) TODO
[ ] Revert skipping of test_parse_dates due to removal of np.asarray in date_converters in #38370
[ ] Enable skiprows tests disabled in #38370
[ ] Enable encoding tests. Proposed enhancements
[ ] ENH: Pass arguments to to_pandas in `read_csv(engine="pyarrow") #34823

Jan 01 '21 03:01 lithomas1

FTR not all of these options are being targeted on the Arrow side (at least for the time being)

xref discussion in https://github.com/pandas-dev/pandas/issues/23697

Jan 01 '21 19:01 arw2019

@lithomas1 As long as we don't bump the minimum pyarrow version we are really limited what we can support.

Theoretically, we could write a function determining the unsupported actions based on the pyarrow version which is installed. This would allow us to circumvent the minimum pyarrow version if users have a newer pyarrow installed.

Dec 23 '21 13:12 phofl

@lithomas1 status here?

Jan 23 '23 16:01 jbrockmendel

Sorry for the silence here (forgot to post), planning on updating this list at the very least (some options are deprecated). I'll try to start chipping away at this again this or next week.

Feb 08 '23 17:02 lithomas1

pandas pandas copied to clipboard

Pyarrow CSV Reader Integration Tracker

pandas
pandas copied to clipboard