textidote icon indicating copy to clipboard operation
textidote copied to clipboard

n-grams analysis (using `--languagemodel`) gives `java.util.ServiceConfigurationError`

Open sim590 opened this issue 6 years ago • 10 comments

When running the following command (file read from stdin):

textidote --languagemodel /path/containing/fr/directory --html --dict .ltignore --check fr

I end up with the following error:

Using N-grams from /home/simon/Téléchargements
TeXtidote v0.7 - A linter for LaTeX documents and others
(C) 2018-2019 Sylvain Hallé - All rights reserved

Exception in thread "main" java.util.ServiceConfigurationError: Cannot instantiate SPI class: org.apache.lucene.codecs.lucene50.Lucene50Codec
	at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:82)
	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:51)
	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:38)
	at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:47)
	at org.apache.lucene.codecs.Codec.forName(Codec.java:113)
	at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:469)
	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:361)
	at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:53)
	at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
	at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
	at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
	at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:242)
	at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:230)
	at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:183)
	at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.addIndex(LuceneSingleIndexLanguageModel.java:119)
	at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.<init>(LuceneSingleIndexLanguageModel.java:93)
	at org.languagetool.languagemodel.LuceneLanguageModel.<init>(LuceneLanguageModel.java:65)
	at org.languagetool.language.French.getLanguageModel(French.java:132)
	at org.languagetool.JLanguageTool.activateLanguageModelRules(JLanguageTool.java:341)
	at ca.uqac.lif.textidote.rules.CheckLanguage.activateLanguageModelRules(CheckLanguage.java:241)
	at ca.uqac.lif.textidote.Main.mainLoop(Main.java:546)
	at ca.uqac.lif.textidote.Main.mainLoop(Main.java:124)
	at ca.uqac.lif.textidote.Main.main(Main.java:110)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [Lucene40, Lucene41]
	at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:114)
	at org.apache.lucene.codecs.PostingsFormat.forName(PostingsFormat.java:112)
	at org.apache.lucene.codecs.lucene50.Lucene50Codec.<init>(Lucene50Codec.java:155)
	at org.apache.lucene.codecs.lucene50.Lucene50Codec.<init>(Lucene50Codec.java:75)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at java.base/java.lang.Class.newInstance(Class.java:584)
	at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:72)
	... 23 more

The n-grams files were downloaded from the following URL:

https://languagetool.org/download/ngram-data/

as instructed by the LanguageTool documentation page.

sim590 avatar Jan 19 '19 12:01 sim590

Something needs to be edited inside the JAR file; solution here: https://anwaarlabs.wordpress.com/2017/02/25/lucene-an-spi-class-of-type-org-apache-lucene-codecs-codec-with-name-does-not-exist/

I can patch the existing release, but I'll have to think of a way to automate this for future releases.

sylvainhalle avatar Jan 19 '19 13:01 sylvainhalle

Release v0.7.1 should fix the problem. Feel free to reopen if the problem persists.

sylvainhalle avatar Jan 19 '19 17:01 sylvainhalle

I just tested 0.7.1. It worked fine! Thanks for the quick reaction!

sim590 avatar Jan 19 '19 19:01 sim590

Thank you for the great program. Unfortunately I get exactly this error when I want to use the latest version (0.8.3) with current n-gram data. Is there a solution for this?

inventionate avatar Dec 30 '21 08:12 inventionate

The new LanguageTool jar seems to have the exact same issue as the previous one. I'll reopen and try to fix it again.

sylvainhalle avatar Jan 02 '22 15:01 sylvainhalle

I’m getting this exact error, but can’t fix it even after following the instructions at https://anwaarlabs.wordpress.com/2017/02/25/lucene-an-spi-class-of-type-org-apache-lucene-codecs-codec-with-name-does-not-exist/.

Is there a workaround simple enough for non-Java users?

Jollywatt avatar Mar 02 '22 06:03 Jollywatt

I "patched" the 0.8.3 version manually and it works for me. Use at your own risk. I edited it those three files in META_INF using vim.

https://transfer.sh/64Oy0T/textidote_patched.jar

bong0 avatar Apr 02 '23 19:04 bong0

@bong0 Thanks for your contribution. I would like to integrate your changes in the pipeline that creates the LanguageTool fat JAR. Could you please tell me which files you modified and what changes you made to them?

sylvainhalle avatar Apr 04 '23 13:04 sylvainhalle

Sure, hope that helps: so I added the lines listed in the diff. Let me know if there's something unclear :) [changed] META-INF/services/org.apache.lucene.codecs.PostingsFormat

❯ diff t1/**/META-INF/services/org.apache.lucene.codecs.PostingsFormat  t2/**/META-INF/services/org.apache.lucene.codecs.PostingsFormat
17a18
> org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat

[changed] META-INF/services/org.apache.lucene.codecs.DocValuesFormat

❯ diff t1/**/META-INF/services/org.apache.lucene.codecs.DocValuesFormat  t2/**/META-INF/services/org.apache.lucene.codecs.DocValuesFormat
20a21
> org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat

[changed] META-INF/services/org.apache.lucene.codecs.Codec

❯ diff t1/**/META-INF/services/org.apache.lucene.codecs.Codec  t2/**/META-INF/services/org.apache.lucene.codecs.Codec           
24a25
> org.apache.lucene.codecs.lucene54.Lucene54Codec

bong0 avatar Apr 06 '23 20:04 bong0