arrow
arrow copied to clipboard
ARROW-13763: [Python] Close files in ParquetFile & ParquetDatasetPiece
Will fix ARROW-13763
A separate Jira issue will be made to address closing files in V2 ParquetDataset, which needs to be handled in the C++ layer.
Adds context manager to pq.ParquetFile
to close input file, and ensure reads within pq.ParquetDataset
and pq.read_table
are closed.
# user opened file-like object will not be closed
with open('file.parquet', 'rb') as f:
with pq.ParquetFile(f) as p:
table = p.read()
assert not f.closed # did not inadvertently close the open file
assert not p.closed
assert not f.closed # parquet context exit didn't close it
assert not p.closed # references the input file status
assert f.closed # normal context exit close
assert p.closed # ...
# path-like will be closed upon exit or `ParquetFile.close`
with pq.ParquetFile('file.parquet') as p:
table = p.read()
assert not p.closed
assert p.closed
https://issues.apache.org/jira/browse/ARROW-13763
But in the case the user passes an file object, we should maybe not close it? (while such a PythonFile object will pass through a close() call to the underlying file object)
So this would imply calling ParquetFile.close
would also not close the passed file? So my second example would be this?
p = pq.ParquetFile(buf)
table = p.read()
assert buf.closed is False
p.close() # In the situation of a passed in open file object, this call would mean nothing, right?
assert buf.closed is False
A separate Jira issue will be made to address closing files in V2 ParquetDataset, which needs to be handled in the C++ layer.
Has that issue been opened already?
It seems ARROW-16421 might cross over enough with the intended issue. See recent comments there on whether explicit closing ought to be done in C++.
The windows failure is happening on master and other PRs as well.
Benchmark runs are scheduled for baseline = b0422e5feb6d335f8e726f6221d42f2a044cd02c and contender = 951663a41c183c8fec5a4da9a8f9daf45ed85451. 951663a41c183c8fec5a4da9a8f9daf45ed85451 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Failed :arrow_down:3.95% :arrow_up:0.85%] test-mac-arm
[Failed :arrow_down:3.01% :arrow_up:0.27%] ursa-i9-9960x
[Finished :arrow_down:1.57% :arrow_up:0.92%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 951663a4
ec2-t3-xlarge-us-east-2
[Finished] 951663a4
test-mac-arm
[Failed] 951663a4
ursa-i9-9960x
[Finished] 951663a4
ursa-thinkcentre-m75q
[Finished] b0422e5f
ec2-t3-xlarge-us-east-2
[Failed] b0422e5f
test-mac-arm
[Failed] b0422e5f
ursa-i9-9960x
[Finished] b0422e5f
ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
['Python', 'R'] benchmarks have high level of regressions. test-mac-arm ursa-i9-9960x