Possible memory leak? Runs out of memory.
I created a docker image for node-markdown-spellcheck which uses hunspell-spellchecker.
$ docker run -ti -v $(pwd):/workdir tmaier/markdown-spellcheck --dictionary /usr/share/hunspell/de_DE_neu --report "README.md"
This test uses the de_DE_neu directory from http://download.services.openoffice.org/contrib/dictionaries
When I run this command, I get the following error message:
<--- Last few GCs --->
[1:0x55b3359ca000] 29164 ms: Mark-sweep 1021.1 (1070.8) -> 1021.1 (1071.8) MB, 4025.9 / 0.0 ms allocation failure GC in old space requested
[1:0x55b3359ca000] 33138 ms: Mark-sweep 1021.1 (1071.8) -> 1021.1 (1040.8) MB, 3974.7 / 0.0 ms last resort gc
[1:0x55b3359ca000] 36722 ms: Mark-sweep 1021.1 (1040.8) -> 1021.1 (1040.8) MB, 3584.2 / 0.0 ms last resort gc
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x3d3e795c0d39 <JS Object>
1: _parseDIC [/usr/local/lib/node_modules/markdown-spellcheck/node_modules/hunspell-spellchecker/lib/dictionary.js:~229] [pc=0x106d2785a283](this=0x265ad9de4ce9 <a Dictionary with map 0x379614986fc9>,data=0x3d3e79504311 <undefined>)
2: parse [/usr/local/lib/node_modules/markdown-spellcheck/node_modules/hunspell-spellchecker/lib/dictionary.js:61] [pc=0x106d27812331](this=0x265ad9de4ce9 <...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
This means it takes more than 1 GB of RAM... Memory leak?
It's the same with it-IT dictionary...
@tmaier Could you write a minimal reproducible example that only use this repo (hunspell-spellchecker) ? So we can ensure markdown-spellcheck isn't parsing multiple times.
That said, it's possible that it's simply inefficient memory usage instead of a memory leak.
A quick look at the code indicates that it allocates a full map entry for each word, there's going to be some memory overhead compared to a trie.
Depending on your use-case, if you're looking to only detect spelling errors without necessarily providing suggestions you could generate a bloom filter off the dictionary which could be much more efficient <1MB of RAM.
I updated the docker command above. The image is located at https://hub.docker.com/r/tmaier/markdown-spellcheck/.
Just run it against any README.md
I am experiencing the same problem and it is simple to reproduce, I am using the dictionaries from https://github.com/wooorm/dictionaries
This problem happens with it and pt dictionaries.
It is easy to reproduce, just run node bin/hunspell-tojson.js LANG and it will break, I get that error on node v6.9.1 but on node >9 it just never finishes.
Here is an example:
const Spellchecker = require("hunspell-spellchecker");
const spellchecker = new Spellchecker();
const fs = require("fs");
const path = require("path");
const base = require.resolve('dictionary-en').replace(/index\.js$/, "");
const DICT = spellchecker.parse({
aff: fs.readFileSync(path.join(base, "index.aff")),
dic: fs.readFileSync(path.join(base, 'index.dic'))
});
spellchecker.use(DICT);
let isRight = spellchecker.check("tll");
console.log(isRight);
isRight = spellchecker.check("eye");
console.log(isRight);
isRight = spellchecker.check("Mario");
console.log(isRight);
isRight = spellchecker.check("mario");
console.log(isRight);
You should install the dictionary-it and dictionary-en packages, and here is the results: first using dictionary-it, then with dictionary-en:
PS D:\Projects\Spell> node .\hun.js
<--- Last few GCs --->
[25776:000001F30D2E5EB0] 23831 ms: Mark-sweep 1684.3 (1719.3) -> 1684.3 (1688.1) MB, 2156.3 / 0.0 ms (average mu = 0.202, current mu = 0.000) last resort GC in old space requested
[25776:000001F30D2E5EB0] 25831 ms: Mark-sweep 1684.3 (1688.1) -> 1684.3 (1688.1) MB, 1999.9 / 0.0 ms (average mu = 0.119, current mu = 0.000) last resort GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: 00007FF66493CEDF v8::internal::wasm::DisjointAllocationPool::~DisjointAllocationPool+74383
2: 00007FF6648E5A16 v8::base::CPU::has_sse+62326
3: 00007FF6648E68B6 v8::base::CPU::has_sse+66070
4: 00007FF66514BBDE v8::Isolate::ReportExternalAllocationLimitReached+94
5: 00007FF665130DA4 v8::SharedArrayBuffer::Externalize+772
6: 00007FF664FEFF7C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1452
7: 00007FF664FEBE11 v8::internal::Heap::AllocateExternalBackingStore+1649
8: 00007FF665006497 v8::internal::Factory::AllocateRawArray+183
9: 00007FF66500CE7B v8::internal::Factory::NewFixedArrayWithFiller+75
10: 00007FF665005CCA v8::internal::Factory::InternalizeStringWithKey<v8::internal::SequentialStringKey<unsigned short> >+74
11: 00007FF664E42F0D v8::internal::HashTable<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::EnsureCapacity+205
12: 00007FF664E38D49 v8::internal::Dictionary<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::Add+105
13: 00007FF664E38AE6 v8::internal::BaseNameDictionary<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::Add+118
14: 00007FF664D3859C v8::internal::Runtime::GetObjectProperty+2092
15: 00007FF6651D348D v8::internal::SetupIsolateDelegate::SetupHeap+465869
16: 00000338E1FE85E0
PS D:\Projects\Spell>
PS D:\Projects\Spell>
PS D:\Projects\Spell> node .\hun.js
false
true
true
false
@tmaier has this issue been fixed? thank you.
Hi @loretoparisi, I don't know. I never tried again.
Can you try it out and report back? If it is fixed, we can close this issue
I just checked it with the updated Docker image from above. The bug is still there.
$ docker run --rm -ti -v $(pwd):/workdir tmaier/markdown-spellcheck:latest --dictionary /usr/share/hunspell/de_DE_comb "README.md"
<--- Last few GCs --->
[1:0x55f5e2cba0a0] 39898 ms: Mark-sweep (reduce) 1686.8 (1746.0) -> 1686.8 (1714.0) MB, 3177.0 / 0.0 ms (average mu = 0.281, current mu = 0.000) last resort GC in old space requested
[1:0x55f5e2cba0a0] 42854 ms: Mark-sweep (reduce) 1686.8 (1714.0) -> 1686.8 (1714.0) MB, 2955.6 / 0.0 ms (average mu = 0.170, current mu = 0.000) last resort GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
@tmaier thank you