arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-43949: [C++] io::BufferedInput: Fixing the invalid state with SetBufferSize

Open mapleFU opened this issue 1 year ago • 5 comments

Rationale for this change

See https://github.com/apache/arrow/issues/43949

The problem is Peek and Read both calls SetBufferSize, however:

  1. Read implicit says that, when SetBufferSize or read, the previous buffer is not being required. In this scenerio, bytes_buffered_ is always 0, since Read will consume all data. And new_buffer_size == std::min(new_buffer_size, (raw_read_bound_ - raw_read_total_))
  2. Peek still requires the buffer-size here. So, bytes_buffered_ might not 0. When Peek, the new_buffer_size would less than expected, which shrinks the buffer

What changes are included in this PR?

Update the Logic of SetBufferSize

  1. If bytes_buffered_ == 0, SetBufferSize can discard the current buffer
  2. Otherwise, SetBufferSize should resize minimal to buffer_size_ + (raw_read_bound_ - raw_read_total_), since it should considering the current buffer

Are these changes tested?

Yes

Are there any user-facing changes?

Bugfix

  • GitHub Issue: #43949

mapleFU avatar Oct 12 '24 09:10 mapleFU

:warning: GitHub issue #43949 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Oct 12 '24 09:10 github-actions[bot]

@pitrou @felipecrv @wgtmac This might be a bit critical fixing

Also cc @biljazovic sorry for so slow replying

mapleFU avatar Oct 12 '24 09:10 mapleFU

Are you calling SetBufferSize from your own code? I think it was a mistake to make this API public, it should only be used for initializing the buffered stream.

pitrou avatar Oct 15 '24 15:10 pitrou

Are you calling SetBufferSize from your own code? I think it was a mistake to make this API public, it should only be used for initializing the buffered stream.

No, Peak will call SetBufferSize to adjust the size if possible. But call SetBufferSize explicitly will also cause the bug

mapleFU avatar Oct 15 '24 16:10 mapleFU

@pitrou would you mind take a look when you have spare time? Thanks!

mapleFU avatar Oct 16 '24 12:10 mapleFU

I'll merge it this week if no negative comment since it's a bugfix. Some comment can be address after bug is fixed

mapleFU avatar Oct 24 '24 05:10 mapleFU

By the way, is this rebased on git main already?

pitrou avatar Oct 24 '24 10:10 pitrou

By the way, is this rebased on git main already?

Previously not but rebased now

mapleFU avatar Oct 24 '24 10:10 mapleFU

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 7d5a8186a514c57c3db2118677c01f61956396fe.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.