ENH: support reading directory in read_csv
- [x] closes https://github.com/bodo-ai/Bodo-Pandas-Collaboration/issues/2
- [x] Tests added and passed if fixing a bug or adding a new feature
- [x] All code checks passed.
- [x] Added type annotations to new arguments/methods/functions.
- [x] Added an entry in the latest
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.
FWIW I recall the team being negative in the past about supporting reading directories of files, and we document just concatting DataFrames read from a directory: https://pandas.pydata.org/docs/user_guide/cookbook.html#reading-multiple-files-to-create-a-single-dataframe. Are we sure we want to include this?
FWIW I recall the team being negative in the past about supporting reading directories of files
Do you remember the reason? This seems like a useful thing, as I think it's common for some datasets to be split in different files with the same schema. And there is some added complexity to this, but it seems consistent with other syntactic sugar we have in IO operations such as decompressing, downloading, etc.
Note that you've got the image from Will's book in this PR, this happened when we had to hard revert it from git history.
The remaining test failures are related to S3. Not sure what the root cause is. Trying to cleanup S3-related tests a bit in https://github.com/pandas-dev/pandas/pull/61703.
i think an unrelated file got added?
i think an unrelated file got added?
Removed.