docker-languagetool icon indicating copy to clipboard operation
docker-languagetool copied to clipboard

Checksum error since 5.7

Open Write opened this issue 2 years ago • 7 comments

Hi, Using the container linux/amd64 (5.11.0-49-generic #55-Ubuntu SMP Wed Jan 12 17:36:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux) latest output a checksum error related to ngrams (whichever the language is requested). Reverting to 5.6-dockerupdate-3 work just fine.

Example on 5.6 (working) : CleanShot 2022-04-01 at 00 59 31

On 5.7 :

At first I tought my french ngrams were corrupted, but the result is the same in english, and I redownloaded all ngrams just in case.

CleanShot 2022-04-01 at 01 02 13

The following configuration is passed to LanguageTool:
languageModel=/ngrams
+ java -Xms512m -Xmx1g -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8010 --public --allow-origin '*' --config config.properties
2022-03-31 23:00:54.436 +0000 INFO  org.languagetool.server.DatabaseAccessOpenSource Not setting up database access, dbDriver is not configured
2022-03-31 23:00:54 +0000 WARNING: running in HTTP mode, consider running LanguageTool behind a reverse proxy that takes care of encryption (HTTPS)
2022-03-31 23:00:54 +0000 WARNING: running in public mode, LanguageTool API can be accessed without restrictions!
2022-03-31 23:00:54 +0000 Setting up thread pool with 10 threads
2022-03-31 23:00:55 +0000 Starting LanguageTool 5.7 (build date: 2022-03-30 13:58:36 +0000, 35d0d40) server on http://localhost:8010...
2022-03-31 23:00:55 +0000 Server started
2022-03-31 23:00:57.496 +0000 INFO  org.languagetool.server.LanguageToolHttpHandler Handling POST /v2/check
2022-03-31 23:01:02.143 +0000 ERROR org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx"))), detected: fr', sending HTTP code 500. Access from 172.18.0.1, HTTP user agent: curl/7.74.0, User agent param: null, Referrer: null, language: fr, h: 1, r: 1, time: 4656text length: 8, m: ALL, l: DEFAULT, Stacktrace follows:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx"))), detected: fr
        at org.languagetool.server.TextChecker.checkText(TextChecker.java:496)
        at org.languagetool.server.ApiV2.handleCheckRequest(ApiV2.java:173)
        at org.languagetool.server.ApiV2.handleRequest(ApiV2.java:84)
        at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:185)
        at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
        at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
        at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
        at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:730)
        at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
        at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:699)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx")))
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at org.languagetool.server.TextChecker.checkText(TextChecker.java:477)
        ... 12 more
Caused by: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx")))
        at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:186)
        at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.addIndex(LuceneSingleIndexLanguageModel.java:118)
        at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.<init>(LuceneSingleIndexLanguageModel.java:93)
        at org.languagetool.languagemodel.LuceneLanguageModel.<init>(LuceneLanguageModel.java:65)
        at org.languagetool.Language.initLanguageModel(Language.java:180)
        at org.languagetool.language.French.getLanguageModel(French.java:149)
        at org.languagetool.JLanguageTool.activateLanguageModelRules(JLanguageTool.java:594)
        at org.languagetool.server.Pipeline.activateLanguageModelRules(Pipeline.java:103)
        at org.languagetool.server.PipelinePool.createPipeline(PipelinePool.java:121)
        at org.languagetool.server.PipelinePool.getPipeline(PipelinePool.java:78)
        at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:789)
        at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:743)
        at org.languagetool.server.TextChecker.lambda$checkText$4(TextChecker.java:460)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        ... 3 more
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx")))
        at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:334)
        at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:364)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.<init>(CompressingStoredFieldsReader.java:140)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:121)
        at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsReader(Lucene50StoredFieldsFormat.java:173)
        at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:117)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
        at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
        at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
        at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:241)
        at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:229)
        at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:182)
        ... 16 more

2022-03-31 23:01:02.171 +0000 INFO  org.languagetool.server.LanguageToolHttpHandler Handled request in 4685ms; sending code 500

Write avatar Mar 31 '22 23:03 Write

Hi @Write, could you try increasing both minimal and maximum heap sizes and see if that solves the issue?

Erikvl87 avatar Apr 07 '22 08:04 Erikvl87

Tried Xms1G and Xmx2g, up to Xmx3g, unfortunately, the error still occure

Write avatar Apr 07 '22 13:04 Write

I have this issue as well, I believe this is introduced with the latest update. It is broken as is, cant be used with the browser extension.

gerroon avatar Apr 21 '22 22:04 gerroon

5.6-dockerupdate-3 version seams to work. I rolled back to it.

gerroon avatar Apr 21 '22 22:04 gerroon

Now I have the issue too with 5.6-dockerupdate-3, absolutely no clue 🤷

Well, adding user: 0:0 to run as root worked for my issue, however I tried for 5.7 but same error.

Write avatar Apr 24 '22 03:04 Write

Sorry, due to personal circumstances I was not able to spend a lot of time on this.

I never got this reproduced, and I am not sure if the issue would be within this dockerized version of LanguageTool. Are you perhaps running this on a Synology NAS? Are the disks OK? Are you able to do a full drive scan and check for bad sectors?

Erikvl87 avatar Jun 29 '22 07:06 Erikvl87

No idea, not using a Synology NAS. Only using SSD. Everything is perfect. My only idea is there's some sort of issue with RAM Allocation and Java JVM. For now i'm using an other image (image: ghcr.io/someone-stole-my-name/docker-languagetool) with - JAVAOPTIONS=-Xms512M -Xmx2G and it works so far.

Write avatar Jun 29 '22 07:06 Write