LinguaCafe icon indicating copy to clipboard operation
LinguaCafe copied to clipboard

Add a large amount of symbols to the skippable words config.

Open simjanos-dev opened this issue 1 year ago • 0 comments

There is a config array in the config/linguacafe.php file called words_to_skip. These words are set as ignored when they are imported, and not counted in the learned and read words statistics.

There is a large number of symbols missing from it, and it is language specific. To fix this issue:

  • Import large amount of texts from every language.
  • Make a MySQL query that returns all words which are 1 (or maybe 2, not sure if there's any) characters long.
  • Look through the list, and add any missing symbols to the words_to_skip array. Numbers don't have to be added to it.

Probably should be done after the next 20~ languages will be added, since the python service has been reworked, and there is no problem with adding more languages.

It could be potentially reversed, by rewriting the filter code and adding the alphabet characters for every language instead of the symbols. Don't know which is more efficient.

simjanos-dev avatar Jan 31 '24 19:01 simjanos-dev