pandas icon indicating copy to clipboard operation
pandas copied to clipboard

Pyarrow CSV Reader Integration Tracker

Open lithomas1 opened this issue 4 years ago • 4 comments
trafficstars

Issue tracking Pyarrow engine integration in read_csv (for after #38370 is merged) Current unsupported options

  • [ ] skipfooter
  • [ ] float_precision
  • [ ] chunksize
  • [ ] comment
  • [ ] nrows
  • [ ] thousands
  • [ ] memory map
  • [ ] dialect
  • [ ] on_bad_lines
  • [ ] delim_whitespace
  • [ ] quoting
  • [ ] lineterminator (this is c-only)
  • [ ] converters
  • [ ] decimal
  • [ ] iterator
  • [ ] dayfirst
  • [ ] verbose
  • [ ] skipinitialspace
  • [ ] low_memory (this is c-only) TODO
  • [ ] Revert skipping of test_parse_dates due to removal of np.asarray in date_converters in #38370
  • [ ] Enable skiprows tests disabled in #38370
  • [ ] Enable encoding tests. Proposed enhancements
  • [ ] ENH: Pass arguments to to_pandas in `read_csv(engine="pyarrow") #34823

lithomas1 avatar Jan 01 '21 03:01 lithomas1

FTR not all of these options are being targeted on the Arrow side (at least for the time being)

xref discussion in https://github.com/pandas-dev/pandas/issues/23697

arw2019 avatar Jan 01 '21 19:01 arw2019

@lithomas1 As long as we don't bump the minimum pyarrow version we are really limited what we can support.

Theoretically, we could write a function determining the unsupported actions based on the pyarrow version which is installed. This would allow us to circumvent the minimum pyarrow version if users have a newer pyarrow installed.

phofl avatar Dec 23 '21 13:12 phofl

@lithomas1 status here?

jbrockmendel avatar Jan 23 '23 16:01 jbrockmendel

Sorry for the silence here (forgot to post), planning on updating this list at the very least (some options are deprecated). I'll try to start chipping away at this again this or next week.

lithomas1 avatar Feb 08 '23 17:02 lithomas1