bloom-filter-scala icon indicating copy to clipboard operation
bloom-filter-scala copied to clipboard

CanGenerateHashFromString is broken in JDK 9+ when string contains non-latin characters or +XX:-CompactStrings JVM flag is used

Open seanrohead opened this issue 4 years ago • 7 comments
trafficstars

CanGenerateHashFromStringByteArray, which is used for JDK9+, assumes that the string is stored using the UTF-8 character encoding and that the length of the underlying byte[] is the same as the length of the string. This assumption only holds true if the string only contains characters from the ISO-8859-1/Latin-1 character set. If the string contains other characters, the string is stored in the underlying byte array as UTF-16 characters and the length of the byte array is 2x the number of characters in the string. Additionally, it is possible to disable this storage optimization using the +XX:-CompactStrings JVM flag in which case all strings are stored as UTF-16 characters. See here and here for more information.

seanrohead avatar Mar 19 '21 23:03 seanrohead

I opened a pull request for this: https://github.com/alexandrnikitin/bloom-filter-scala/pull/54/files

seanrohead avatar Mar 19 '21 23:03 seanrohead

Have similar error but with CanGenerateHashFromString

Caused by: java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:27)
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:23)

yarosman avatar Mar 31 '21 19:03 yarosman

@yarosman Are you using the latest version of the library? That issue was fixed in 0.13.0.

seanrohead avatar Mar 31 '21 20:03 seanrohead

@yarosman Are you using the latest version of the library? That issue was fixed in 0.13.0.

@seanrohead We use 0.13.1

yarosman avatar Mar 31 '21 21:03 yarosman

@yarosman Are you loading the bloom filter using serialization by any chance?

seanrohead avatar Mar 31 '21 22:03 seanrohead

@seanrohead Yes, we do. And I found that we don't use predefined method writeTo/readTo therefore we serialize with CanGenerateHashFrom, which dependent from java. Or you have another explanation or idea ?

yarosman avatar Apr 01 '21 06:04 yarosman

Have similar error but with CanGenerateHashFromString

Caused by: java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:27)
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:23)

Did you try use CanGenerateHashFromStringByteArray?

yufan022 avatar May 27 '22 03:05 yufan022