spellcheck-github-actions icon indicating copy to clipboard operation
spellcheck-github-actions copied to clipboard

Problem with UTF-8 character in wordlist file

Open jonasbn opened this issue 2 years ago • 4 comments

I am observing an issue with the action in the repository jonasbn/perl-task-date-holidays

The word: Rezić is reported as a spelling mistake even when listed in the word list file (.wordslist.txt).

REF: relevant jonasbn/perl-task-date-holidays@150683d26f8dfdc07d1f97a07991b506416d0cfc of jonasbn/perl-task-date-holidays/.wordlist.txt as head has been altered.

This is the configuration:

matrix:
- name: Markdown
  aspell:
    lang: en
    ignore-case: true
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

REF: perl-task-date-holidays/.spellcheck.yaml

jonasbn avatar Oct 02 '22 09:10 jonasbn

Try specifyingRezic in your English dictionary. It may simply be due to how ASPELL normalizes characters in an English dictionary.

facelessuser avatar Oct 02 '22 13:10 facelessuser

Thanks @facelessuser I will try that

jonasbn avatar Oct 02 '22 13:10 jonasbn

I'm kind of digging into the settings. I mainly use English words, so I don't have experience often with using some foreign words and such, so I haven't dug into all the Unicode normalization options and such. There may be an even better approach, but I may have to play around to see what that is.

facelessuser avatar Oct 02 '22 13:10 facelessuser

From the Aspell documentation:

If a word contains a character that the language can’t handle it will still be ignored (for example a Cyrillic letter in a Latin based language).

I imagine this may simply be an issue of using certain characters within an English dictionary.

facelessuser avatar Oct 02 '22 14:10 facelessuser