nspell icon indicating copy to clipboard operation
nspell copied to clipboard

Question re: adding extra dictionaries

Open akotranza opened this issue 4 years ago β€’ 2 comments

node 12.10.x nspell 2.1.2

I'm not sure if this is a bug or my misinterpretation of the documentation.

Adding extra dictionaries via

new nspell([{aff: mainaff_buff, dic: maindic_buff}, {dic: extradic_buff}]), or nspell_instance.dictionary(<extradic_buff>), or nspell_instance.dictionary(['some','new',words'].join('\n'));

appears to cause all inputs to be considered correct by .correct and .suggest

I do see that the nspell_instance object contains all of the words from both maindic and extradic as well as all of the affix information, but it does not seem to be used πŸ˜•

However, using .personal to add extra words does work as expected and testing with both maindic and the extra words produces correct .correct and .suggest outputs.

I'm not sure if I am missing something in the documentation or if this is an issue. I am using the dictionary-en package to load the main aff & dic, and the extra dictionaries are plain word lists in utf-8 loaded into a buffer.

Steps to reproduce:

A. baseline, single dictionary

const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)

nspell.correct('ultrasonogram') // => false; OK b/c not in dictionary
nspell.correct('ultrasongram') // => false; OK
nspell.correct('feleing') // => false; πŸ‘ 

nspell.suggest('ultrasonogram') // => [ ]; OK
nspell.suggest('ultrasongram') // => [ ]; OK
nspell.suggest('feleing') // => ['feeling', 'fleeing', ...]; πŸ‘ 

B. add words via .dictionary (same behavior if it's another buffer passed in to constructor)

const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.dictionary(['ultrasonogram','ultrasonosurgery'].join('\n'))

nspell.correct('ultrasonogram') // => true
nspell.correct('ultrasongram') // => true; πŸ‘Ž 
nspell.correct('feleing') // => true; πŸ‘Ž 

nspell.suggest('ultrasonogram') // => []
nspell.suggest('ultrasongram') // => []
nspell.suggest('feleing') // => []

C. add words via .personal

const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.dictionary(['ultrasonogram','ultrasonosurgery'].join('\n'))

nspell.correct('ultrasonogram') // => true
nspell.correct('ultrasongram') // => false
nspell.correct('feleing') // => false

nspell.suggest('ultrasonogram') // => []
nspell.suggest('ultrasongram') // => ['ultrasonogram'] πŸ‘ 
nspell.suggest('feleing') // => ['feeling', 'fleeing', ...] πŸ‘ 

akotranza avatar Jul 22 '20 17:07 akotranza

First: dictionaries must start with a number of how many items they’ll contain. And also: they’re funky and have to be made to work with the other affix file, so it’s probably better to work with .personal.

But yup, I can reproduce this bug.

With this code:

const NSpell = require('nspell')
const en = require('dictionary-en')

en(function (err, maindic) {
  if (err) throw err
  const nspell = new NSpell(maindic)
  nspell.dictionary(['2', 'ultrasonogram','ultrasonosurgery'].join('\n'))

  console.log(nspell.correct('ultrasonogram')) // => true
  console.log(nspell.correct('ultrasongram')) // => true; πŸ‘Ž 
  console.log(nspell.correct('feleing')) // => true; πŸ‘Ž 

  console.log(nspell.suggest('ultrasonogram')) // => []
  console.log(nspell.suggest('ultrasongram')) // => []
  console.log(nspell.suggest('feleing')) // => []
})

...and console.log('source:', [rule, source]) right before here, I get:

source: [
  'n*1t',
  '(?:0|1|2|3|4|5|6|7|8|9)*(?:1)(?:0th|1th|2th|3th|4th|5th|6th|7th|8th|9th)'
]
source: [
  'n*mp',
  '(?:0|1|2|3|4|5|6|7|8|9)*(?:0|2|3|4|5|6|7|8|9)(?:0th|1st|2nd|3rd|4th|5th|6th|7th|8th|9th)'
]
source: [
  /(?:0|1|2|3|4|5|6|7|8|9)*(?:1)(?:0th|1th|2th|3th|4th|5th|6th|7th|8th|9th)/i,
  ''
]
source: [
  /(?:0|1|2|3|4|5|6|7|8|9)*(?:0|2|3|4|5|6|7|8|9)(?:0th|1st|2nd|3rd|4th|5th|6th|7th|8th|9th)/i,
  ''
]
true
true
true
[]
[]
[]

The problem is that the regexes fail to work, leading to an empty regex (/(?:)/), resulting in any word marked as valid πŸ€”

wooorm avatar Jul 22 '20 19:07 wooorm

Thanks for the quick reply!

The application I'm concerned with is pulling in a ~10,000 term hunspell medical term dictionary (which does have the count as the first line, though I completely forgot about it in the repro example).

It appears to be working as expected and performing well loading the whole dictionary with .personal (after editing the dic to remove term count and the gpl license notice 😬) so I'll continue using that approach.

I'd be interested to investigate more though my knowledge of how the affix stuff works is zero

akotranza avatar Jul 22 '20 20:07 akotranza