GH-43949: [C++] io::BufferedInput: Fixing the invalid state with SetBufferSize
Rationale for this change
See https://github.com/apache/arrow/issues/43949
The problem is Peek and Read both calls SetBufferSize, however:
Readimplicit says that, whenSetBufferSizeor read, the previous buffer is not being required. In this scenerio,bytes_buffered_is always 0, sinceReadwill consume all data. Andnew_buffer_size == std::min(new_buffer_size, (raw_read_bound_ - raw_read_total_))Peekstill requires the buffer-size here. So,bytes_buffered_might not 0. WhenPeek, thenew_buffer_sizewould less than expected, which shrinks the buffer
What changes are included in this PR?
Update the Logic of SetBufferSize
- If
bytes_buffered_ == 0,SetBufferSizecan discard the current buffer - Otherwise,
SetBufferSizeshould resize minimal tobuffer_size_ + (raw_read_bound_ - raw_read_total_), since it should considering the current buffer
Are these changes tested?
Yes
Are there any user-facing changes?
Bugfix
- GitHub Issue: #43949
:warning: GitHub issue #43949 has been automatically assigned in GitHub to PR creator.
@pitrou @felipecrv @wgtmac This might be a bit critical fixing
Also cc @biljazovic sorry for so slow replying
Are you calling SetBufferSize from your own code? I think it was a mistake to make this API public, it should only be used for initializing the buffered stream.
Are you calling SetBufferSize from your own code? I think it was a mistake to make this API public, it should only be used for initializing the buffered stream.
No, Peak will call SetBufferSize to adjust the size if possible. But call SetBufferSize explicitly will also cause the bug
@pitrou would you mind take a look when you have spare time? Thanks!
I'll merge it this week if no negative comment since it's a bugfix. Some comment can be address after bug is fixed
By the way, is this rebased on git main already?
By the way, is this rebased on git main already?
Previously not but rebased now
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 7d5a8186a514c57c3db2118677c01f61956396fe.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.