hunspell-spellchecker icon indicating copy to clipboard operation
hunspell-spellchecker copied to clipboard

Possible memory leak? Runs out of memory.

Open tmaier opened this issue 9 years ago • 9 comments

I created a docker image for node-markdown-spellcheck which uses hunspell-spellchecker.

$ docker run -ti -v $(pwd):/workdir tmaier/markdown-spellcheck --dictionary /usr/share/hunspell/de_DE_neu --report "README.md"

This test uses the de_DE_neu directory from http://download.services.openoffice.org/contrib/dictionaries

When I run this command, I get the following error message:

<--- Last few GCs --->

[1:0x55b3359ca000]    29164 ms: Mark-sweep 1021.1 (1070.8) -> 1021.1 (1071.8) MB, 4025.9 / 0.0 ms  allocation failure GC in old space requested
[1:0x55b3359ca000]    33138 ms: Mark-sweep 1021.1 (1071.8) -> 1021.1 (1040.8) MB, 3974.7 / 0.0 ms  last resort gc
[1:0x55b3359ca000]    36722 ms: Mark-sweep 1021.1 (1040.8) -> 1021.1 (1040.8) MB, 3584.2 / 0.0 ms  last resort gc


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x3d3e795c0d39 <JS Object>
    1: _parseDIC [/usr/local/lib/node_modules/markdown-spellcheck/node_modules/hunspell-spellchecker/lib/dictionary.js:~229] [pc=0x106d2785a283](this=0x265ad9de4ce9 <a Dictionary with map 0x379614986fc9>,data=0x3d3e79504311 <undefined>)
    2: parse [/usr/local/lib/node_modules/markdown-spellcheck/node_modules/hunspell-spellchecker/lib/dictionary.js:61] [pc=0x106d27812331](this=0x265ad9de4ce9 <...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

This means it takes more than 1 GB of RAM... Memory leak?

tmaier avatar Dec 17 '16 15:12 tmaier

It's the same with it-IT dictionary...

Enrico204 avatar Aug 16 '17 12:08 Enrico204

@tmaier Could you write a minimal reproducible example that only use this repo (hunspell-spellchecker) ? So we can ensure markdown-spellcheck isn't parsing multiple times.

That said, it's possible that it's simply inefficient memory usage instead of a memory leak.

A quick look at the code indicates that it allocates a full map entry for each word, there's going to be some memory overhead compared to a trie.

Depending on your use-case, if you're looking to only detect spelling errors without necessarily providing suggestions you could generate a bloom filter off the dictionary which could be much more efficient <1MB of RAM.

AaronO avatar Aug 16 '17 14:08 AaronO

I updated the docker command above. The image is located at https://hub.docker.com/r/tmaier/markdown-spellcheck/.

Just run it against any README.md

tmaier avatar Aug 16 '17 16:08 tmaier

I am experiencing the same problem and it is simple to reproduce, I am using the dictionaries from https://github.com/wooorm/dictionaries This problem happens with it and pt dictionaries. It is easy to reproduce, just run node bin/hunspell-tojson.js LANG and it will break, I get that error on node v6.9.1 but on node >9 it just never finishes.

ghost avatar Mar 06 '18 13:03 ghost

Here is an example:

const Spellchecker = require("hunspell-spellchecker");
const spellchecker = new Spellchecker();
const fs = require("fs");
const path = require("path");

const base = require.resolve('dictionary-en').replace(/index\.js$/, "");
const DICT = spellchecker.parse({
    aff: fs.readFileSync(path.join(base, "index.aff")),
    dic: fs.readFileSync(path.join(base, 'index.dic'))
});
spellchecker.use(DICT);

let isRight = spellchecker.check("tll");
console.log(isRight);
isRight = spellchecker.check("eye");
console.log(isRight);
isRight = spellchecker.check("Mario");
console.log(isRight);
isRight = spellchecker.check("mario");
console.log(isRight);

You should install the dictionary-it and dictionary-en packages, and here is the results: first using dictionary-it, then with dictionary-en:

PS D:\Projects\Spell> node .\hun.js

<--- Last few GCs --->

[25776:000001F30D2E5EB0]    23831 ms: Mark-sweep 1684.3 (1719.3) -> 1684.3 (1688.1) MB, 2156.3 / 0.0 ms  (average mu = 0.202, current mu = 0.000) last resort GC in old space requested
[25776:000001F30D2E5EB0]    25831 ms: Mark-sweep 1684.3 (1688.1) -> 1684.3 (1688.1) MB, 1999.9 / 0.0 ms  (average mu = 0.119, current mu = 0.000) last resort GC in old space requested

<--- JS stacktrace --->

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 00007FF66493CEDF v8::internal::wasm::DisjointAllocationPool::~DisjointAllocationPool+74383
 2: 00007FF6648E5A16 v8::base::CPU::has_sse+62326
 3: 00007FF6648E68B6 v8::base::CPU::has_sse+66070
 4: 00007FF66514BBDE v8::Isolate::ReportExternalAllocationLimitReached+94
 5: 00007FF665130DA4 v8::SharedArrayBuffer::Externalize+772
 6: 00007FF664FEFF7C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1452
 7: 00007FF664FEBE11 v8::internal::Heap::AllocateExternalBackingStore+1649
 8: 00007FF665006497 v8::internal::Factory::AllocateRawArray+183
 9: 00007FF66500CE7B v8::internal::Factory::NewFixedArrayWithFiller+75
10: 00007FF665005CCA v8::internal::Factory::InternalizeStringWithKey<v8::internal::SequentialStringKey<unsigned short> >+74
11: 00007FF664E42F0D v8::internal::HashTable<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::EnsureCapacity+205
12: 00007FF664E38D49 v8::internal::Dictionary<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::Add+105
13: 00007FF664E38AE6 v8::internal::BaseNameDictionary<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::Add+118
14: 00007FF664D3859C v8::internal::Runtime::GetObjectProperty+2092
15: 00007FF6651D348D v8::internal::SetupIsolateDelegate::SetupHeap+465869
16: 00000338E1FE85E0
PS D:\Projects\Spell>
PS D:\Projects\Spell>
PS D:\Projects\Spell> node .\hun.js
false
true
true
false

crystalfp avatar Jun 17 '20 12:06 crystalfp

@tmaier has this issue been fixed? thank you.

loretoparisi avatar Oct 19 '20 08:10 loretoparisi

Hi @loretoparisi, I don't know. I never tried again.

Can you try it out and report back? If it is fixed, we can close this issue

tmaier avatar Oct 24 '20 15:10 tmaier

I just checked it with the updated Docker image from above. The bug is still there.

$ docker run --rm -ti -v $(pwd):/workdir tmaier/markdown-spellcheck:latest --dictionary /usr/share/hunspell/de_DE_comb "README.md"

<--- Last few GCs --->

[1:0x55f5e2cba0a0]    39898 ms: Mark-sweep (reduce) 1686.8 (1746.0) -> 1686.8 (1714.0) MB, 3177.0 / 0.0 ms  (average mu = 0.281, current mu = 0.000) last resort GC in old space requested
[1:0x55f5e2cba0a0]    42854 ms: Mark-sweep (reduce) 1686.8 (1714.0) -> 1686.8 (1714.0) MB, 2955.6 / 0.0 ms  (average mu = 0.170, current mu = 0.000) last resort GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

tmaier avatar Oct 24 '20 23:10 tmaier

@tmaier thank you

loretoparisi avatar Oct 25 '20 18:10 loretoparisi