docker-languagetool
docker-languagetool copied to clipboard
Checksum error since 5.7
Hi, Using the container linux/amd64 (5.11.0-49-generic #55-Ubuntu SMP Wed Jan 12 17:36:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux) latest output a checksum error related to ngrams (whichever the language is requested). Reverting to 5.6-dockerupdate-3 work just fine.
Example on 5.6 (working) :
On 5.7 :
At first I tought my french ngrams were corrupted, but the result is the same in english, and I redownloaded all ngrams just in case.
The following configuration is passed to LanguageTool:
languageModel=/ngrams
+ java -Xms512m -Xmx1g -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8010 --public --allow-origin '*' --config config.properties
2022-03-31 23:00:54.436 +0000 INFO org.languagetool.server.DatabaseAccessOpenSource Not setting up database access, dbDriver is not configured
2022-03-31 23:00:54 +0000 WARNING: running in HTTP mode, consider running LanguageTool behind a reverse proxy that takes care of encryption (HTTPS)
2022-03-31 23:00:54 +0000 WARNING: running in public mode, LanguageTool API can be accessed without restrictions!
2022-03-31 23:00:54 +0000 Setting up thread pool with 10 threads
2022-03-31 23:00:55 +0000 Starting LanguageTool 5.7 (build date: 2022-03-30 13:58:36 +0000, 35d0d40) server on http://localhost:8010...
2022-03-31 23:00:55 +0000 Server started
2022-03-31 23:00:57.496 +0000 INFO org.languagetool.server.LanguageToolHttpHandler Handling POST /v2/check
2022-03-31 23:01:02.143 +0000 ERROR org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx"))), detected: fr', sending HTTP code 500. Access from 172.18.0.1, HTTP user agent: curl/7.74.0, User agent param: null, Referrer: null, language: fr, h: 1, r: 1, time: 4656text length: 8, m: ALL, l: DEFAULT, Stacktrace follows:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx"))), detected: fr
at org.languagetool.server.TextChecker.checkText(TextChecker.java:496)
at org.languagetool.server.ApiV2.handleCheckRequest(ApiV2.java:173)
at org.languagetool.server.ApiV2.handleRequest(ApiV2.java:84)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:185)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:730)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:699)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx")))
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.languagetool.server.TextChecker.checkText(TextChecker.java:477)
... 12 more
Caused by: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx")))
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:186)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.addIndex(LuceneSingleIndexLanguageModel.java:118)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.<init>(LuceneSingleIndexLanguageModel.java:93)
at org.languagetool.languagemodel.LuceneLanguageModel.<init>(LuceneLanguageModel.java:65)
at org.languagetool.Language.initLanguageModel(Language.java:180)
at org.languagetool.language.French.getLanguageModel(French.java:149)
at org.languagetool.JLanguageTool.activateLanguageModelRules(JLanguageTool.java:594)
at org.languagetool.server.Pipeline.activateLanguageModelRules(Pipeline.java:103)
at org.languagetool.server.PipelinePool.createPipeline(PipelinePool.java:121)
at org.languagetool.server.PipelinePool.getPipeline(PipelinePool.java:78)
at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:789)
at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:743)
at org.languagetool.server.TextChecker.lambda$checkText$4(TextChecker.java:460)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
... 3 more
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=97ec8ffc actual=901b5b3c (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/ngrams/fr/1grams/_16.fdx")))
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:334)
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:364)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.<init>(CompressingStoredFieldsReader.java:140)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:121)
at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsReader(Lucene50StoredFieldsFormat.java:173)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:117)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:241)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:229)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:182)
... 16 more
2022-03-31 23:01:02.171 +0000 INFO org.languagetool.server.LanguageToolHttpHandler Handled request in 4685ms; sending code 500
Hi @Write, could you try increasing both minimal and maximum heap sizes and see if that solves the issue?
Tried Xms1G and Xmx2g, up to Xmx3g, unfortunately, the error still occure
I have this issue as well, I believe this is introduced with the latest update. It is broken as is, cant be used with the browser extension.
5.6-dockerupdate-3
version seams to work. I rolled back to it.
Now I have the issue too with 5.6-dockerupdate-3, absolutely no clue 🤷
Well, adding user: 0:0 to run as root worked for my issue, however I tried for 5.7 but same error.
Sorry, due to personal circumstances I was not able to spend a lot of time on this.
I never got this reproduced, and I am not sure if the issue would be within this dockerized version of LanguageTool. Are you perhaps running this on a Synology NAS? Are the disks OK? Are you able to do a full drive scan and check for bad sectors?
No idea, not using a Synology NAS.
Only using SSD. Everything is perfect.
My only idea is there's some sort of issue with RAM Allocation and Java JVM. For now i'm using an other image (image: ghcr.io/someone-stole-my-name/docker-languagetool
) with - JAVAOPTIONS=-Xms512M -Xmx2G
and it works so far.