terminal icon indicating copy to clipboard operation
terminal copied to clipboard

Upgrade to rapidhash

Open Nicoshev opened this issue 1 year ago • 3 comments

Upgrading 64-bit from wyhash to rapidhash.

The wyhash owner has recognized rapidhash as the official successor in its repo.

rapidhash has shown to be faster and of higher quality than wyhash.

It is also officially compatible with MSVC across all architectures.

References and Relevant Issues

rapidhash is currently the fastest recommended hash function by SMHasher.

rapidhash is currently the fastest passing hash in SMHasher3.

Collision-based study showing a collision probability lower than wyhash and close to ideal.

Nicoshev avatar Jun 11 '24 23:06 Nicoshev

@microsoft-github-policy-service agree

Nicoshev avatar Jun 11 '24 23:06 Nicoshev

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (8)

CHPE contaning intrin Overwritres plast rapidhash stdint Xors

Previously acknowledged words that are now absent CRLFs Redir wcsicmp 🫥
To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the [email protected]:Nicoshev/windowsTerminal.git repository on the main branch (:information_source: how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/v0.0.22/apply.pl' |
perl - 'https://github.com/microsoft/terminal/actions/runs/9474153275/attempts/1'
Available :books: dictionaries could cover words (expected and unrecognized) not in the :blue_book: dictionary

This includes both expected items (2213) from .github/actions/spelling/expect/04cdb9b77d6827c0202f51acd4205b017015bfff.txt .github/actions/spelling/expect/alphabet.txt .github/actions/spelling/expect/expect.txt .github/actions/spelling/expect/web.txt and unrecognized words (8)

Dictionary Entries Covers Uniquely
cspell:cpp/src/lang-jargon.txt 11 1 1
cspell:swift/src/swift.txt 53 1 1
cspell:gaming-terms/dict/gaming-terms.txt 59 1 1
cspell:monkeyc/src/monkeyc_keywords.txt 123 1 1
cspell:cryptocurrencies/cryptocurrencies.txt 125 1 1

Consider adding them (in .github/workflows/spelling2.yml) for uses: check-spelling/[email protected] in its with:

      with:
        extra_dictionaries:
          cspell:cpp/src/lang-jargon.txt
          cspell:swift/src/swift.txt
          cspell:gaming-terms/dict/gaming-terms.txt
          cspell:monkeyc/src/monkeyc_keywords.txt
          cspell:cryptocurrencies/cryptocurrencies.txt

To stop checking additional dictionaries, add (in .github/workflows/spelling2.yml) for uses: check-spelling/[email protected] in its with:

check_extra_dictionaries: ''
Errors (2)

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

:x: Errors Count
:x: check-file-path 1
:x: ignored-expect-variant 3

See :x: Event descriptions for more information.

:pencil2: Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

If the listed items are:

  • ... misspelled, then please correct them instead of using the command.
  • ... names, please add them to .github/actions/spelling/allow/names.txt.
  • ... APIs, you can add them to a file in .github/actions/spelling/allow/.
  • ... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
  • ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

:microscope: You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. :wink:

If the flagged items are :exploding_head: false positives

If items relate to a ...

  • binary file (or some other file you wouldn't want to check at all).

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

  • well-formed pattern.

    If you can write a pattern that would match it, try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

github-actions[bot] avatar Jun 11 '24 23:06 github-actions[bot]

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (8)

CHPE contaning intrin Overwritres plast rapidhash stdint Xors

Previously acknowledged words that are now absent CRLFs Redir wcsicmp 🫥
To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the [email protected]:Nicoshev/windowsTerminal.git repository on the main branch (:information_source: how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/v0.0.22/apply.pl' |
perl - 'https://github.com/microsoft/terminal/actions/runs/9474397513/attempts/1'
Available :books: dictionaries could cover words (expected and unrecognized) not in the :blue_book: dictionary

This includes both expected items (2213) from .github/actions/spelling/expect/04cdb9b77d6827c0202f51acd4205b017015bfff.txt .github/actions/spelling/expect/alphabet.txt .github/actions/spelling/expect/expect.txt .github/actions/spelling/expect/web.txt and unrecognized words (8)

Dictionary Entries Covers Uniquely
cspell:cpp/src/lang-jargon.txt 11 1 1
cspell:swift/src/swift.txt 53 1 1
cspell:gaming-terms/dict/gaming-terms.txt 59 1 1
cspell:monkeyc/src/monkeyc_keywords.txt 123 1 1
cspell:cryptocurrencies/cryptocurrencies.txt 125 1 1

Consider adding them (in .github/workflows/spelling2.yml) for uses: check-spelling/[email protected] in its with:

      with:
        extra_dictionaries:
          cspell:cpp/src/lang-jargon.txt
          cspell:swift/src/swift.txt
          cspell:gaming-terms/dict/gaming-terms.txt
          cspell:monkeyc/src/monkeyc_keywords.txt
          cspell:cryptocurrencies/cryptocurrencies.txt

To stop checking additional dictionaries, add (in .github/workflows/spelling2.yml) for uses: check-spelling/[email protected] in its with:

check_extra_dictionaries: ''
Errors (2)

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

:x: Errors Count
:x: check-file-path 1
:x: ignored-expect-variant 3

See :x: Event descriptions for more information.

:pencil2: Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

If the listed items are:

  • ... misspelled, then please correct them instead of using the command.
  • ... names, please add them to .github/actions/spelling/allow/names.txt.
  • ... APIs, you can add them to a file in .github/actions/spelling/allow/.
  • ... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
  • ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

:microscope: You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. :wink:

If the flagged items are :exploding_head: false positives

If items relate to a ...

  • binary file (or some other file you wouldn't want to check at all).

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

  • well-formed pattern.

    If you can write a pattern that would match it, try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

github-actions[bot] avatar Jun 12 '24 00:06 github-actions[bot]

Thanks for all the effort here, Nicolas!

I have some concerns, plus a couple "favorite things" about what we're doing today.

Right now, our implementation of wyhash is fully contained and inlined into til, and nobody needs to include any further headers or take on any statics[^1].

I'm concerned that we are also only replacing the 64-bit hash function, so we're not fully pulling wyhash from our code. We will forever have two OSS hash library dependencies rather than just the one.

All that said, I'd still be interested... but we are also not using it for any data larger than, say, 1kb.

I'm not certain the performance boost for our specific workload is worth switching, all other things taken into consideration. If we were going to be processing untold heaps of data I might feel differently, but... eh. It's written now, but so is the code we had before. 🤷🏻

[^1]: Doesn't the compiler emit one of these into every translation unit? We've got a lot of TUs that include this...

DHowett avatar Jul 08 '24 22:07 DHowett

Hey @DHowett, thanks a lot for the review!

The global variable should not be replicated accross translation units because rapidhash.h is only used by hash.h, which states '#pragma once' before including it. For extra security, I've added '#pragma once' to rapidhash.h :)

I've also just updated the rapidhash header to a newer version, which has greater compatibility with C++. The global variable is now marked as 'constexpr' within C++ compilation, hinting the compiler to inline its value whenever used. Nevertheless, this optimization was still observed when compiling a C file including the header. The new updated header file also adds the noexcept qualifiers, which allowed the removal of warning supressing code.

Having the 64-bit hash function undocked may be a favorable thing looking forward. The last commit shows how effortless is to update the hash implementation. It should be easy to perform any change to the hash function if needed in the future.

wyhash has been deprecated by its owner, it may be counterproductive to use unmaintained code.

Nicoshev avatar Jul 09 '24 06:07 Nicoshev

Thanks!

#pragma once doesn't prevent a header from being included in every translation unit; it just prevents a header from being included more than once in the same TU!

I'm glad that it was easy to update. That is definitely a point in its favor!

FWIW, though, on maintenance: does wyhash need maintenance? Will rapidhash need maintenance? I'm firmly of the belief that a project can be "done". If a vulnerability is discovered somehow in the hashing algorithm that would put us at risk, the work is the same to fix it as it is to move to a different hash function. Why move now, from a project that is entirely complete, in the absence of such a vulnerability, for the use cases we have?

DHowett avatar Jul 09 '24 15:07 DHowett

It is unclear if wyhash or rapidhash will benefit from maintenance.

New language standards may introduce qualifiers or notions from which the hash functions could benefit.

I do agree that this change is not absolutely necesary, although it does leave the codebase modernized and will produce a slight battery life improvement on devices using the terminal. Additionally, having available the lastest version of the hash function may be useful for a future arising need.

I can also change the PR to implement rapidhash in-place within hash.h, if that's preferable.

Thanks, Nicolas

Nicoshev avatar Jul 09 '24 17:07 Nicoshev

It seems like there's not a measurable improvement from changing this at this time. We may as well spare everyone engineering effort by leaving this for now. If in the future there's a measurable improvement to be gained from switching, we can revisit this. Thanks!

zadjii-msft avatar Nov 14 '24 20:11 zadjii-msft