hyrise
hyrise copied to clipboard
SegmentAccessCounters treat binary search as full scan
The basic assumption of the SegmentAccessCounters is that they are fully consumed, i.e., that every iterable value is touched:
https://github.com/hyrise/hyrise/blob/09943167c9dbbec99f0829673453c9244ce4108c/src/lib/storage/value_segment/value_segment_iterable.hpp#L34-L35
This is not necessarily the case. If a scan uses SortedSegmentSearch, the access counters are incremented as if the entire segment was scanned:
https://github.com/hyrise/hyrise/blob/09943167c9dbbec99f0829673453c9244ce4108c/src/lib/operators/table_scan/sorted_segment_search.hpp#L53-L59
This does not break anything in my thesis just because of how little of an influence the scans have on the overall performance of the TPC-H, but it should still be fixed.
As a first idea, advance(ptrdiff_t n)
could decrease the access counter by n - 1
.
Another point, which has even less impact on TPC-H but that could be tackled at the same time: Right now, we assume that an iterator is fully consumed. This is not the case for the Limit operator. By checking how many rows were not consumed in ~Iterator
, we could improve the accuracy there, too.
Good point.
Also: I think this is strongly connected with #1531, because I assume that we still iteratively walk through our iterators instead of directly going to the requested positions (at least I am not aware of recent changes in this space). So the access counters might be even too small.