large-hashable icon indicating copy to clipboard operation
large-hashable copied to clipboard

Fix bug #25 for GHC 9.4 and text-2

Open skogsbaer opened this issue 2 years ago • 6 comments

skogsbaer avatar May 19 '23 08:05 skogsbaer

@sternenseemann @tvh we should merge this PR. Who has the rights to do so?

skogsbaer avatar May 26 '23 14:05 skogsbaer

Lazy text also uses UTF-8 internally (it's just a list of strict tests). I don't see a way to retain backwards compatibility without introducing a performance overhead, e.g. by converting UTF-8 to UTF-16 to do the hashing.

skogsbaer avatar May 27 '23 16:05 skogsbaer

I agree there is no way to avoid that overhead. The reason why I mentioned lazy text is that it would avoid having to have all in memory at the same time.

tvh avatar May 27 '23 16:05 tvh

Here are some benchmark numbers for a 1MB text. I only report the mean of the benchmark executed with criterion.

  • Hash native UTF-8 encoding: 1.412 ms
  • Hash UTf-16 encoding (for backwards compatibility): 16.53 ms
  • Convert text to string, then hash the chars: 19.46 ms
  • Directly hash the string: 14.02 ms

So hashing the native UTF-8 encoding is much faster (as expected).

skogsbaer avatar Oct 01 '24 07:10 skogsbaer

@skogsbaer I think you will need to talk to @loisch whether this change is okay. His assumption is that the hashes coming out of this lib are stable.

For the lazy text, I think I just wanted to make sure that

  1. We don't convert the text into strict first
  2. The hash is the same as for strict text

tvh avatar Oct 01 '24 12:10 tvh

@tvh There are only few places where stability for meqo is required. My plan is to add a new (optional) method largeHashStable and to rewrite meqo code to use largeHashStable where needed.

skogsbaer avatar Oct 02 '24 06:10 skogsbaer

@tvh There is now largeHashStable with tests

skogsbaer avatar Oct 07 '24 15:10 skogsbaer

Great!

tvh avatar Oct 07 '24 16:10 tvh

Merge?

skogsbaer avatar Oct 07 '24 19:10 skogsbaer

@skogsbaer I don't have permissions on this repo anymore.

tvh avatar Oct 07 '24 19:10 tvh