Improvements in `hash_functions.cuh`
The file hash_functions.cuh has quite a bit of room for cleanup and improvement.
- Use
std::bytemore broadly. - Reduce the use of magic values in hash functions.
- Separate translation units if possible -- to what extent can we avoid having a single large header?
- Make error messages more consistent.
- Use
compute_bytesapproach instead of re-implementing hash function instring_viewtemplate instantiation. (Make sure [un]aligned reads are handled correctly.)
See PRs for reference: comments in #9919, ongoing SHA work in #9215, refactors in #10379.
Additional context
What about
std::byte, usingstd::to_integerto convert to the integer type needed at any point where we need computation other than the operators supported onstd::byte(only supports bitwise operations)?
Originally posted by @harrism in https://github.com/rapidsai/cudf/pull/9919#discussion_r781625077
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This is still relevant. I aim to work on a few more of these ideas after #11296, when I can also move SparkMurmurHash3_32 out of hash_functions.cuh. See also: https://github.com/rapidsai/cudf/pull/11292#discussion_r925887225
edit: that move is done in #11489.
We should also add documentation to detail methods in hashing.hpp.
@bdice We've made some improvements to hash_functions.cuh, should we close this?