cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Improvements in `hash_functions.cuh`

Open bdice opened this issue 4 years ago • 5 comments

The file hash_functions.cuh has quite a bit of room for cleanup and improvement.

  • Use std::byte more broadly.
  • Reduce the use of magic values in hash functions.
  • Separate translation units if possible -- to what extent can we avoid having a single large header?
  • Make error messages more consistent.
  • Use compute_bytes approach instead of re-implementing hash function in string_view template instantiation. (Make sure [un]aligned reads are handled correctly.)

See PRs for reference: comments in #9919, ongoing SHA work in #9215, refactors in #10379.

Additional context

What about std::byte, using std::to_integer to convert to the integer type needed at any point where we need computation other than the operators supported on std::byte (only supports bitwise operations)?

Originally posted by @harrism in https://github.com/rapidsai/cudf/pull/9919#discussion_r781625077

bdice avatar Jan 19 '22 23:01 bdice

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Feb 19 '22 00:02 github-actions[bot]

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Apr 01 '22 01:04 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar Jun 30 '22 01:06 github-actions[bot]

This is still relevant. I aim to work on a few more of these ideas after #11296, when I can also move SparkMurmurHash3_32 out of hash_functions.cuh. See also: https://github.com/rapidsai/cudf/pull/11292#discussion_r925887225

edit: that move is done in #11489.

bdice avatar Jul 21 '22 23:07 bdice

We should also add documentation to detail methods in hashing.hpp.

bdice avatar Jul 27 '22 22:07 bdice

@bdice We've made some improvements to hash_functions.cuh, should we close this?

GregoryKimball avatar Sep 27 '23 22:09 GregoryKimball