arrow
arrow copied to clipboard
ARROW-16719: [Python] Add path/URI /+ filesystem handling to parquet.read_metadata
Add filesystem
support to pq.read_metadata
and pq.read_schema
.
https://issues.apache.org/jira/browse/ARROW-16719
:warning: Ticket has not been started in JIRA, please click 'Start Progress'.
macOS CI Failure looks unrelated.
Thank you for working on this issue @kshitij12345! LGTM +1
@jorisvandenbossche can you please have a look before we merge this PR?
Failures look unrelated. Should I retrigger the CI?
I think some of the failures are related:
=================================== FAILURES ===================================
_______________________ test_metadata_schema_filesystem ________________________
tmpdir = local('/tmp/pytest-of-root/pytest-0/test_metadata_schema_filesyste0')
def test_metadata_schema_filesystem(tmpdir):
table = pa.table({"a": [1, 2, 3]})
# URI writing to local file.
fname = "data.parquet"
file_path = 'file:///' + os.path.join(str(tmpdir), fname)
pq.write_table(table, file_path)
# Get expected `metadata` from path.
metadata = pq.read_metadata(tmpdir / fname)
schema = table.schema
assert pq.read_metadata(file_path).equals(metadata)
> assert pq.read_metadata(
fname, filesystem=f'file:///{tmpdir}').equals(metadata)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/parquet/test_metadata.py:553:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/parquet/__init__.py:3425: in read_metadata
file = ParquetFile(where, memory_map=memory_map,
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/parquet/__init__.py:287: in __init__
self.reader.open(
pyarrow/_parquet.pyx:1225: in pyarrow._parquet.ParquetReader.open
???
pyarrow/io.pxi:1674: in pyarrow.lib.get_reader
???
pyarrow/io.pxi:1665: in pyarrow.lib.get_native_file
???
pyarrow/io.pxi:943: in pyarrow.lib.OSFile.__cinit__
???
pyarrow/io.pxi:953: in pyarrow.lib.OSFile._open_readable
???
pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E FileNotFoundError: [Errno 2] Failed to open local file 'data.parquet'. Detail: [errno 2] No such file or directory
Thanks for catching that @AlenkaF!
~~And Windows strikes 😓 ! I don't access to a Windows system. I think file:///
is not handled by Windows. Do you have any recommendations or is it ok to skip that particular approach on Windows?~~
Looks to be happening on other platforms as well. My bad, read the CI logs incorrectly.
Gentle ping @jorisvandenbossche @AlenkaF
Gentle ping :)
@kshitij12345 there are some test failures that actually seem related
@jorisvandenbossche CI failure looks irrelevant. PTAL :)
I pushed a small additional update to the test (mainly changing to use our internal tempdir
fixture instead of tmpdir
)
Thanks again for the PR @kshitij12345 !
Benchmark runs are scheduled for baseline = f6127fca7ade9665f31493d37929346e651ed0e4 and contender = 42ed37e3fc84465f365531e611f1bf632b599e7b. 42ed37e3fc84465f365531e611f1bf632b599e7b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Finished :arrow_down:3.44% :arrow_up:2.89%] test-mac-arm
[Failed :arrow_down:4.38% :arrow_up:1.92%] ursa-i9-9960x
[Finished :arrow_down:5.12% :arrow_up:2.42%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 42ed37e3
ec2-t3-xlarge-us-east-2
[Finished] 42ed37e3
test-mac-arm
[Failed] 42ed37e3
ursa-i9-9960x
[Finished] 42ed37e3
ursa-thinkcentre-m75q
[Finished] f6127fca
ec2-t3-xlarge-us-east-2
[Finished] f6127fca
test-mac-arm
[Failed] f6127fca
ursa-i9-9960x
[Finished] f6127fca
ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
['Python', 'R'] benchmarks have high level of regressions. test-mac-arm ursa-i9-9960x
@jorisvandenbossche Thank you very much :)