pandas
pandas copied to clipboard
Pyarrow CSV Reader Integration Tracker
Issue tracking Pyarrow engine integration in read_csv (for after #38370 is merged) Current unsupported options
- [ ] skipfooter
- [ ] float_precision
- [ ] chunksize
- [ ] comment
- [ ] nrows
- [ ] thousands
- [ ] memory map
- [ ] dialect
- [ ] on_bad_lines
- [ ] delim_whitespace
- [ ] quoting
- [ ] lineterminator (this is c-only)
- [ ] converters
- [ ] decimal
- [ ] iterator
- [ ] dayfirst
- [ ] verbose
- [ ] skipinitialspace
- [ ] low_memory (this is c-only) TODO
- [ ] Revert skipping of test_parse_dates due to removal of np.asarray in date_converters in #38370
- [ ] Enable skiprows tests disabled in #38370
- [ ] Enable encoding tests. Proposed enhancements
- [ ] ENH: Pass arguments to
to_pandasin `read_csv(engine="pyarrow") #34823
FTR not all of these options are being targeted on the Arrow side (at least for the time being)
xref discussion in https://github.com/pandas-dev/pandas/issues/23697
@lithomas1 As long as we don't bump the minimum pyarrow version we are really limited what we can support.
Theoretically, we could write a function determining the unsupported actions based on the pyarrow version which is installed. This would allow us to circumvent the minimum pyarrow version if users have a newer pyarrow installed.
@lithomas1 status here?
Sorry for the silence here (forgot to post), planning on updating this list at the very least (some options are deprecated). I'll try to start chipping away at this again this or next week.