ORC-262: [C++] Support async io prefetch for orc c++ lib
What changes were proposed in this pull request?
Support async io prefetch for orc c++ lib. Close https://issues.apache.org/jira/browse/ORC-262
Changes:
- Added new interface
InputStream::readAsync(default unimplemented). It reads io asynchronously within the specified range. - Added IO Cache implementation
ReadRangeCacheto cache async io results. This borrows from a similar design of Parquet Reader in https://github.com/apache/arrow - Added interface
Reader::preBufferto trigger io prefetch. In the specific implementation ofReaderImpl::preBuffer, the io ranges will be calculated according to the selected stripe and columns, and then these ranges will be merged and sorted, andReadRangeCache::cachewill be called to trigger the asynchronous io in the background, waiting for the use of the upper layer - Added the interface
Reader::releaseBuffer, which is used to release all cached io ranges before an offset
Why are the changes needed?
Async io prefetch could hide io latency during reading orc files, which improves performance of scan operators in ClickHouse.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
Reader::preBuffer prefetch stripes as a unit which might be too large. For those users who don't want to prefetch entire file one-shot, they have to know the structure of the file. Do you think it is a good idea to make prefetch transparent to users and let the orc reader prefetch data(eg. 1MB for each column at a time) when it's proper.
What's more, we could make enable async IO a option and expose a cache interface for users so they can implement their eviction policy.
Reader::preBufferprefetch stripes as a unit which might be too large. For those users who don't want to prefetch entire file one-shot, they have to know the structure of the file. Do you think it is a good idea to make prefetch transparent to users and let the orc reader prefetch data(eg. 1MB for each column at a time) when it's proper. What's more, we could make enable async IO a option and expose a cache interface for users so they can implement their eviction policy.
It is totally decided by users to choose whether to prefetch the whole orc file or single/multiple columns in single stripe or single column in single/multiple stripes. Reader::preBuffer already supported all those options.
It is better letting user invoke Reader::preBuffer explicitly because only user knows which stripe/columns to read. Thus they could find the best change to prefetch to hide io latency sufficiently. e.g. the orc prefetch implementation in ClickHouse relying on current PR: https://github.com/ClickHouse/ClickHouse/pull/70534 (speed up 1.47x). Besides, the parquet reader in apache arrow also has similar design.
@ffacs @wgtmac any more comments ? Thanks!
Sorry that I'm a little bit overwhelmed these days. Will take a look when I get the chance.
BTW, @luffy-zh is implementing exposing RowIndex positions: https://github.com/apache/orc/pull/2005. Perhaps there is an opportunity to further prefetch io together with predicate pushdown.
@wgtmac That's a great work. We could do more improvements on IO latency hiding after it is merged.
@wgtmac @ffacs I had finished the requsted changes. Hope for your further reviews, thanks!
Thank you all. Given the active status of this PR, I added a milestone label, 2.1.0. Hopefully, Apache ORC 2.1 can have this.
- https://github.com/apache/orc/milestone/28 (2025-01-17).
Could you resolve the conflicts, @taiyang-li ?
Could you resolve the conflicts, @taiyang-li ?
Done.
@wgtmac thanks for your advice. I had already finished the requested changes. Do you think the pr is ready to be merged ?
Thanks for the changes! It generally looks good. I think the test cases need to be polished.
Thanks for detailed reviews. I had improved the test cases.
Merged to main for Apache ORC 2.1.0 on January 2025.
I added you to the Apache ORC contributor group and assigned ORC-262 to you, @taiyang-li .
Also, I updated ORC-1767 JIRA issue by assigning to you.
Thank you and welcome to the Apache ORC community again!
Thank you very much @dongjoon-hyun, I am very happy to join the Apache ORC Contributor Group. Apache Gluten relies heavily on this library, and I'm looking forward to contributing more in the future.