large-hashable Fix bug #25 for GHC 9.4 and text-2

Fix bug #25 for GHC 9.4 and text-2

Open skogsbaer opened this issue 2 years ago • 6 comments

May 19 '23 08:05 skogsbaer

@sternenseemann @tvh we should merge this PR. Who has the rights to do so?

May 26 '23 14:05 skogsbaer

Lazy text also uses UTF-8 internally (it's just a list of strict tests). I don't see a way to retain backwards compatibility without introducing a performance overhead, e.g. by converting UTF-8 to UTF-16 to do the hashing.

May 27 '23 16:05 skogsbaer

I agree there is no way to avoid that overhead. The reason why I mentioned lazy text is that it would avoid having to have all in memory at the same time.

May 27 '23 16:05 tvh

Here are some benchmark numbers for a 1MB text. I only report the mean of the benchmark executed with criterion.

Hash native UTF-8 encoding: 1.412 ms
Hash UTf-16 encoding (for backwards compatibility): 16.53 ms
Convert text to string, then hash the chars: 19.46 ms
Directly hash the string: 14.02 ms

So hashing the native UTF-8 encoding is much faster (as expected).

Oct 01 '24 07:10 skogsbaer

@skogsbaer I think you will need to talk to @loisch whether this change is okay. His assumption is that the hashes coming out of this lib are stable.

For the lazy text, I think I just wanted to make sure that

We don't convert the text into strict first
The hash is the same as for strict text

Oct 01 '24 12:10 tvh

@tvh There are only few places where stability for meqo is required. My plan is to add a new (optional) method largeHashStable and to rewrite meqo code to use largeHashStable where needed.

Oct 02 '24 06:10 skogsbaer

@tvh There is now largeHashStable with tests

Oct 07 '24 15:10 skogsbaer

Great!

Oct 07 '24 16:10 tvh

Merge?

Oct 07 '24 19:10 skogsbaer

@skogsbaer I don't have permissions on this repo anymore.

Oct 07 '24 19:10 tvh

large-hashable large-hashable copied to clipboard

Fix bug #25 for GHC 9.4 and text-2

large-hashable
large-hashable copied to clipboard