keyboard icon indicating copy to clipboard operation
keyboard copied to clipboard

Features of the sample text for languages

Open antlarr opened this issue 3 years ago • 1 comments

I noticed the plugins directory contains some free books to extract language characteristics (n-grams) that I guess are used for the autocomplete feature.

In the case of Spanish, I saw that the book "Don Quijote de la Mancha" is used as sample text. This is good because the book is long and it has a large vocabulary, but it has the problem that it was written in 1605-1615, so it uses quite a lot of old Spanish vocabulary and expressions that are not used at all these days and it doesn't include new words and expressions that appeared since then.

So I think it would be good to find a substitute text.

Apart from it being free for (re)distribution. Are there any special features that the text should have?

antlarr avatar Jun 25 '21 18:06 antlarr

Apart from it being free for (re)distribution. Are there any special features that the text should have?

I think, generally, we've relied on things that are in the Public Domain. Which of course has the common issue of being too old.

The auto completion and correction stuff definitely needs some major improvements. I'm not entirely sure what to do there yet, though.

dobey avatar Sep 16 '21 17:09 dobey