ltex-ls
ltex-ls copied to clipboard
Checking of long documents fails with LanguageTool Premium API
Describe the bug
Long documents are not checked when the LanguageTool HTTP API is used. Requests seem to fail with error 413. When checking a (short) selection of the document, it works as expected. The full document is also correctly checked when the local LanguageTool instance is used (i.e., when no languageToolHttpServerUri
is configured). A long document is any document with, e.g., 8000 words and 50000 characters.
Steps to reproduce
Open long document in VSCode, wait forever for language check results (or watch failure in LTeX Language Server logs).
Expected behavior
The document is fully checked. If the text is too long, I would expect it to be split into multiple separate requests that get successfully processed. In the worst case though, it should at least show a visible warning to the user instead of silently failing.
Sample document
Reproduction sample can be generated on https://www.lipsum.com/ by choosing 8000 words.
LTeX configuration
"ltex.languageToolHttpServerUri": "https://api.languagetoolplus.com/",
"ltex.languageToolOrg.username": "[removed]",
"ltex.languageToolOrg.apiKey": "[removed]"
LTeX LS log
Jan 27, 2023 7:19:22 PM org.bsplines.ltexls.server.DocumentChecker logTextToBeChecked
FINE: Checking the following text in language 'en-US' via LanguageTool: "[removed]"... (truncated to 100 characters)
Jan 27, 2023 7:19:23 PM org.bsplines.ltexls.languagetool.LanguageToolHttpInterface checkInternal
SEVERE: LanguageTool failed with HTTP status code 413
Jan 27, 2023 7:19:23 PM org.bsplines.ltexls.server.DocumentChecker checkAnnotatedTextFragment
FINE: Obtained 0 rule matches
Version information
- Operating system: Windows 11
- vscode-ltex: 13.1.0
- ltex-ls: no idea how to figure this out from the VSCode extension
I see the same issue on emacs / lsp-ltex-ls (though I can't find a log that confirms the 413 error code)
For me, the limit seems to be around ~20000 chars, and according to https://languagetoolplus.com/http-api/#/default this would mean that my credentials don't really work... Do you have any hints on how to debug this?
Okay, I did some more checking by enabling logging.
- When I change my wrong username/api key to the config, I get 403. And I see corrections for Premium rules, so the credentials work in general.
- When I try the API manually with the failing tests (using the web interface https://languagetoolplus.com/http-api/#/default), checking works.
- In the log I see a lot of these:
FINEST: annotatedTextParts = [TEXT("L"), TEXT("o"), TEXT("r"), TEXT("e"), TEXT("m"), MARKUP(" "), FAKE_CONTENT(" "), TEXT("i"), TEXT("p"), TEXT("s"), TEXT("u"), TEXT("m"), MARKUP(" "), FAKE_CONTENT(" "), TEXT("d"), TEXT("o"), TEXT("l"), TEXT("o"), TEXT("r"), MARKUP(" "), FAKE_CONTENT(" "), TEXT("s"), TEXT("i"), TEXT("t"), MARKUP(" "), `
Are the texts actually sent like this in JSON, i.e., everything is split into single elements for every character? Maybe the request body will be too large then.
Edit: It seems the answer is yes: https://github.com/valentjn/ltex-ls/blob/1a5897683c2f913f80f336af2386e093d7e7cab2/src/main/kotlin/org/bsplines/ltexls/languagetool/LanguageToolHttpInterface.kt#L175-L203
I'm unsure if this is the root cause, but the loop can certainly be optimized to produce shorter JSON.
I'm unsure if this is the root cause, but the loop can certainly be optimized to produce shorter JSON.
Okay, this is the root cause... I have some local changes that optimize the JSON output. I can open a PR soon.
That would be very nice :+1:
See #228, which works well for me locally.
Still, more should be done. We should at least truncate the JSON output at the API limit. We could also split it into multiple requests, but I'm convinced that this is much better because then you easily hit the minutely limits.
By the way, it's still a good idea to set ltex.checkFrequency
to "save" to avoid hitting the API limits. This makes Premium much less useful, and not clearly better than the open-source version. I complained about the low limits about https://forum.languagetool.org/t/disappointing-api-limits-for-premium/8728, feel free to join me if this also bothers you.
I still see this problem in VS Code. Was the extension updated on the VS Code marketplace? Should I install something manually?
I tried a night build from the release section of GitHub repository. VS Code shows "Starting LTeX..." at the bottom forever. The LTeX Language Server output shows
[Info - 7:20:58 PM] Starting ltex-ls...
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Jul 25, 2023 7:21:01 PM org.bsplines.ltexls.server.LtexLanguageServer initialize
INFO: ltex-ls 16.0.1-alpha.1.nightly.2023-07-25 - initializing...
Jul 25, 2023 7:21:01 PM org.bsplines.ltexls.tools.I18n setLocale
I tried changing the Java runtime, no help. How can I use the fix for this problem?
I get the same problem while using nvim v0.9.1 and ltex-ls v16.0.0 implemented with null-ls. However, the same file gets checked correctly in vscode with vscode-ltex v13.1.0
@danielnaber I reached out to LanguageTooler GmbH. I explained to the support that the limit of 150,000 characters is a fake because this limit includes the characters in the augmented text, not the actual text being checked. Users cannot control the former. Since it is entirely unexpected for any user, it elevates to being lied to about the product when users buy it. I suggested setting the limit on the characters of the actual text. The support replied, “This is not intended to be changed,” and ghosted. They didn’t comment about lying about the product. I tried talking to a lawyer in the US, but he said that the fact that the company is in Germany makes it difficult. So, guys, if you are in Germany, you may be able to file a customer fraud case.
Meanwhile, I've done the minimum I could: I have canceled my Premium subscription. The funny part is that they've sent me an email (automated) asking me to cancel my cancellation. Within the sales pitch, among other things, they mentioned that “It can also check longer texts with up to 100,000 characters.” I replied that 100,000 characters is a lie because it limits an arbitrary amount of augmentation, not the actual text. They didn't answer. Oh, well. I switched to Grammarly.