pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: support reading directory in read_csv

Open fangchenli opened this issue 8 months ago • 6 comments

  • [x] closes https://github.com/bodo-ai/Bodo-Pandas-Collaboration/issues/2
  • [x] Tests added and passed if fixing a bug or adding a new feature
  • [x] All code checks passed.
  • [x] Added type annotations to new arguments/methods/functions.
  • [x] Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

fangchenli avatar Apr 12 '25 07:04 fangchenli

FWIW I recall the team being negative in the past about supporting reading directories of files, and we document just concatting DataFrames read from a directory: https://pandas.pydata.org/docs/user_guide/cookbook.html#reading-multiple-files-to-create-a-single-dataframe. Are we sure we want to include this?

mroeschke avatar Apr 22 '25 16:04 mroeschke

FWIW I recall the team being negative in the past about supporting reading directories of files

Do you remember the reason? This seems like a useful thing, as I think it's common for some datasets to be split in different files with the same schema. And there is some added complexity to this, but it seems consistent with other syntactic sugar we have in IO operations such as decompressing, downloading, etc.

datapythonista avatar Apr 30 '25 22:04 datapythonista

Note that you've got the image from Will's book in this PR, this happened when we had to hard revert it from git history.

datapythonista avatar May 20 '25 22:05 datapythonista

The remaining test failures are related to S3. Not sure what the root cause is. Trying to cleanup S3-related tests a bit in https://github.com/pandas-dev/pandas/pull/61703.

fangchenli avatar Jun 25 '25 06:06 fangchenli

i think an unrelated file got added?

jbrockmendel avatar Aug 15 '25 21:08 jbrockmendel

i think an unrelated file got added?

Removed.

fangchenli avatar Aug 15 '25 22:08 fangchenli