hyrise icon indicating copy to clipboard operation
hyrise copied to clipboard

SegmentAccessCounters treat binary search as full scan

Open mrks opened this issue 3 years ago • 1 comments

The basic assumption of the SegmentAccessCounters is that they are fully consumed, i.e., that every iterable value is touched:

https://github.com/hyrise/hyrise/blob/09943167c9dbbec99f0829673453c9244ce4108c/src/lib/storage/value_segment/value_segment_iterable.hpp#L34-L35

This is not necessarily the case. If a scan uses SortedSegmentSearch, the access counters are incremented as if the entire segment was scanned:

https://github.com/hyrise/hyrise/blob/09943167c9dbbec99f0829673453c9244ce4108c/src/lib/operators/table_scan/sorted_segment_search.hpp#L53-L59

This does not break anything in my thesis just because of how little of an influence the scans have on the overall performance of the TPC-H, but it should still be fixed.

As a first idea, advance(ptrdiff_t n) could decrease the access counter by n - 1.


Another point, which has even less impact on TPC-H but that could be tackled at the same time: Right now, we assume that an iterator is fully consumed. This is not the case for the Limit operator. By checking how many rows were not consumed in ~Iterator, we could improve the accuracy there, too.

mrks avatar Apr 15 '21 20:04 mrks

Good point.

Also: I think this is strongly connected with #1531, because I assume that we still iteratively walk through our iterators instead of directly going to the requested positions (at least I am not aware of recent changes in this space). So the access counters might be even too small.

Bouncner avatar Apr 15 '21 21:04 Bouncner