substrait icon indicating copy to clipboard operation
substrait copied to clipboard

LocalFiles::FileOrFiles::uri_folder could be brittle, do we want `exclude_invalid_files`?

Open westonpace opened this issue 2 years ago • 4 comments

In my (admittedly limited) experience it has been pretty rare that a dataset contains only data files and nothing else (e.g. metadata files, dataset descriptions, etc.) I know we have uri_glob but since we aren't requiring support for **.

In arrow we have an exclude_invalid_files option which can be specified alongside a directory (and defaults to true so maybe the protobuf name is assume_files_valid). If set to true then we will attempt to determine if a file is a valid data file which is a format-specific operation. For example, if we are reading parquet we will look for the magic bytes.

westonpace avatar May 04 '22 21:05 westonpace