nspell
nspell copied to clipboard
Question re: adding extra dictionaries
node 12.10.x nspell 2.1.2
I'm not sure if this is a bug or my misinterpretation of the documentation.
Adding extra dictionaries via
new nspell([{aff: mainaff_buff, dic: maindic_buff}, {dic: extradic_buff}])
, or
nspell_instance.dictionary(<extradic_buff>)
, or
nspell_instance.dictionary(['some','new',words'].join('\n'));
appears to cause all inputs to be considered correct by .correct
and .suggest
I do see that the nspell_instance
object contains all of the words from both maindic and extradic as well as all of the affix information, but it does not seem to be used π
However, using .personal
to add extra words does work as expected and testing with both maindic and the extra words produces correct .correct
and .suggest
outputs.
I'm not sure if I am missing something in the documentation or if this is an issue. I am using the dictionary-en
package to load the main aff & dic, and the extra dictionaries are plain word lists in utf-8 loaded into a buffer.
Steps to reproduce:
A. baseline, single dictionary
const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.correct('ultrasonogram') // => false; OK b/c not in dictionary
nspell.correct('ultrasongram') // => false; OK
nspell.correct('feleing') // => false; π
nspell.suggest('ultrasonogram') // => [ ]; OK
nspell.suggest('ultrasongram') // => [ ]; OK
nspell.suggest('feleing') // => ['feeling', 'fleeing', ...]; π
B. add words via .dictionary (same behavior if it's another buffer passed in to constructor)
const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.dictionary(['ultrasonogram','ultrasonosurgery'].join('\n'))
nspell.correct('ultrasonogram') // => true
nspell.correct('ultrasongram') // => true; π
nspell.correct('feleing') // => true; π
nspell.suggest('ultrasonogram') // => []
nspell.suggest('ultrasongram') // => []
nspell.suggest('feleing') // => []
C. add words via .personal
const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.dictionary(['ultrasonogram','ultrasonosurgery'].join('\n'))
nspell.correct('ultrasonogram') // => true
nspell.correct('ultrasongram') // => false
nspell.correct('feleing') // => false
nspell.suggest('ultrasonogram') // => []
nspell.suggest('ultrasongram') // => ['ultrasonogram'] π
nspell.suggest('feleing') // => ['feeling', 'fleeing', ...] π
First: dictionaries must start with a number of how many items theyβll contain. And also: theyβre funky and have to be made to work with the other affix file, so itβs probably better to work with .personal
.
But yup, I can reproduce this bug.
With this code:
const NSpell = require('nspell')
const en = require('dictionary-en')
en(function (err, maindic) {
if (err) throw err
const nspell = new NSpell(maindic)
nspell.dictionary(['2', 'ultrasonogram','ultrasonosurgery'].join('\n'))
console.log(nspell.correct('ultrasonogram')) // => true
console.log(nspell.correct('ultrasongram')) // => true; π
console.log(nspell.correct('feleing')) // => true; π
console.log(nspell.suggest('ultrasonogram')) // => []
console.log(nspell.suggest('ultrasongram')) // => []
console.log(nspell.suggest('feleing')) // => []
})
...and console.log('source:', [rule, source])
right before here, I get:
source: [
'n*1t',
'(?:0|1|2|3|4|5|6|7|8|9)*(?:1)(?:0th|1th|2th|3th|4th|5th|6th|7th|8th|9th)'
]
source: [
'n*mp',
'(?:0|1|2|3|4|5|6|7|8|9)*(?:0|2|3|4|5|6|7|8|9)(?:0th|1st|2nd|3rd|4th|5th|6th|7th|8th|9th)'
]
source: [
/(?:0|1|2|3|4|5|6|7|8|9)*(?:1)(?:0th|1th|2th|3th|4th|5th|6th|7th|8th|9th)/i,
''
]
source: [
/(?:0|1|2|3|4|5|6|7|8|9)*(?:0|2|3|4|5|6|7|8|9)(?:0th|1st|2nd|3rd|4th|5th|6th|7th|8th|9th)/i,
''
]
true
true
true
[]
[]
[]
The problem is that the regexes fail to work, leading to an empty regex (/(?:)/
), resulting in any word marked as valid π€
Thanks for the quick reply!
The application I'm concerned with is pulling in a ~10,000 term hunspell medical term dictionary (which does have the count as the first line, though I completely forgot about it in the repro example).
It appears to be working as expected and performing well loading the whole dictionary with .personal
(after editing the dic to remove term count and the gpl license notice π¬) so I'll continue using that approach.
I'd be interested to investigate more though my knowledge of how the affix stuff works is zero