orc ORC-262: [C++] Support async io prefetch for orc c++ lib

What changes were proposed in this pull request?

Support async io prefetch for orc c++ lib. Close https://issues.apache.org/jira/browse/ORC-262

Changes:

Added new interface InputStream::readAsync(default unimplemented). It reads io asynchronously within the specified range.
Added IO Cache implementation ReadRangeCache to cache async io results. This borrows from a similar design of Parquet Reader in https://github.com/apache/arrow
Added interface Reader::preBuffer to trigger io prefetch. In the specific implementation of ReaderImpl::preBuffer, the io ranges will be calculated according to the selected stripe and columns, and then these ranges will be merged and sorted, and ReadRangeCache::cache will be called to trigger the asynchronous io in the background, waiting for the use of the upper layer
Added the interface Reader::releaseBuffer, which is used to release all cached io ranges before an offset

Why are the changes needed?

Async io prefetch could hide io latency during reading orc files, which improves performance of scan operators in ClickHouse.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Oct 09 '24 07:10 taiyang-li

Reader::preBuffer prefetch stripes as a unit which might be too large. For those users who don't want to prefetch entire file one-shot, they have to know the structure of the file. Do you think it is a good idea to make prefetch transparent to users and let the orc reader prefetch data(eg. 1MB for each column at a time) when it's proper. What's more, we could make enable async IO a option and expose a cache interface for users so they can implement their eviction policy.

Oct 11 '24 03:10 ffacs

Reader::preBuffer prefetch stripes as a unit which might be too large. For those users who don't want to prefetch entire file one-shot, they have to know the structure of the file. Do you think it is a good idea to make prefetch transparent to users and let the orc reader prefetch data(eg. 1MB for each column at a time) when it's proper. What's more, we could make enable async IO a option and expose a cache interface for users so they can implement their eviction policy.

It is totally decided by users to choose whether to prefetch the whole orc file or single/multiple columns in single stripe or single column in single/multiple stripes. Reader::preBuffer already supported all those options.

It is better letting user invoke Reader::preBuffer explicitly because only user knows which stripe/columns to read. Thus they could find the best change to prefetch to hide io latency sufficiently. e.g. the orc prefetch implementation in ClickHouse relying on current PR: https://github.com/ClickHouse/ClickHouse/pull/70534 (speed up 1.47x). Besides, the parquet reader in apache arrow also has similar design.

Oct 11 '24 04:10 taiyang-li

@ffacs @wgtmac any more comments ? Thanks!

Oct 16 '24 02:10 taiyang-li

Sorry that I'm a little bit overwhelmed these days. Will take a look when I get the chance.

BTW, @luffy-zh is implementing exposing RowIndex positions: https://github.com/apache/orc/pull/2005. Perhaps there is an opportunity to further prefetch io together with predicate pushdown.

Oct 16 '24 15:10 wgtmac

@wgtmac That's a great work. We could do more improvements on IO latency hiding after it is merged.

Oct 17 '24 00:10 taiyang-li

@wgtmac @ffacs I had finished the requsted changes. Hope for your further reviews, thanks!

Oct 29 '24 01:10 taiyang-li

Thank you all. Given the active status of this PR, I added a milestone label, 2.1.0. Hopefully, Apache ORC 2.1 can have this.

https://github.com/apache/orc/milestone/28 (2025-01-17).

Nov 04 '24 18:11 dongjoon-hyun

Could you resolve the conflicts, @taiyang-li ?

Nov 07 '24 20:11 dongjoon-hyun

Could you resolve the conflicts, @taiyang-li ?

Done.

Nov 11 '24 01:11 taiyang-li

@wgtmac thanks for your advice. I had already finished the requested changes. Do you think the pr is ready to be merged ?

Nov 21 '24 08:11 taiyang-li

Thanks for the changes! It generally looks good. I think the test cases need to be polished.

Thanks for detailed reviews. I had improved the test cases.

Nov 29 '24 04:11 taiyang-li

Merged to main for Apache ORC 2.1.0 on January 2025.

I added you to the Apache ORC contributor group and assigned ORC-262 to you, @taiyang-li .

Also, I updated ORC-1767 JIRA issue by assigning to you.

Thank you and welcome to the Apache ORC community again!

Dec 02 '24 23:12 dongjoon-hyun

Thank you very much @dongjoon-hyun, I am very happy to join the Apache ORC Contributor Group. Apache Gluten relies heavily on this library, and I'm looking forward to contributing more in the future.

Dec 03 '24 01:12 taiyang-li