h3
h3 copied to clipboard
Add range checks
Implement https://github.com/uber/h3/issues/392
Not sure if everything is correct, I did not dig into the algorithms. But at least it fixes https://github.com/ClickHouse/ClickHouse/issues/19219
There should not be significant performance degradation for two reasons:
- the branch is easy predictable and it can be executed in parallel due to instruction level parallelism;
- the algorithms are already heavy enough, the difference in a few clock ticks should be neglible.
Coverage decreased (-0.7%) to 98.516% when pulling af8068a99f3193ee050e3da81cc40f30a92e9d2e on ClickHouse-Extras:clickhouse-range-checks into dd0b1e554fec63fc4698fec98445fddcbbecc903 on uber:master.
I see there are some issues with formatting. I don't think I'm ready to finish this PR, feel free to apply any edits or close and resubmit another PR...
Thanks for submitting this! You should be able to fix the test failures with make format
. The bigger issue here is probably test coverage - we'll need to add tests for the new branches introduced here.
@isaacbrodsky - thoughts on bringing this into 3.x? This is potentially a security issue.
I haven't been following the issues daily so sorry for being late to the conversation again...
The lack of range checks on arrays throughout H3 core (including the recent baseCells
array merge) was intentional (as it is in the C language itself). Personally I think that performance degradation in H3 core should be avoided at all costs. The hit might be small for each change, but such things add up. Rather than changing the core functions, I would strongly encourage you to add "safe" wrapper functions that can be used in situations where the user doesn't want the onus of checking their own array indexes when necessary.
To @sahrk' point, another suggestion posted in the issues is to expand the base cells array and ensure we never calculate an index into that array outside of 0-127 inclusive.
I should point-out that the baseCells and the faces in faceIjkToBaseCell are not the only arrays that are accessed without bounds checking in the core. There are a whole bunch of arrays in the core, many with multiple dimensions. If I ever did bounds checking on any of them it was by mistake :).
The core was designed to be as tight as possible, which is why so much of it is table lookups, which are as cheap as it gets in C. In the case of the baseCells we're talking about a function that's just a wrapper for an array access. However you slice it, if you add bounds checking you're at least doubling the cost of the function.
I would like to ensure that H3 is resilient to dereferencing out of bounds. Beyond the security concerns, this type of crash is often very difficult to debug, especially when it happens in a library loaded into a manageed framework. I would be willing to create a new 3.x release to incorporate this kind of change.
That is all reasonable of course. Earlier I suggested making safe versions of table lookup functions that are exposed to external users (like baseCells), but what if you made the API functions safe and created internal functions without bounds checking for use just in the core?