andaluh-py Problem with dotless i

Problem with dotless i

Open FernanOrtega opened this issue 4 years ago • 0 comments

There is a problem when trying to transcript a word like "Cacık" that contains a special char, in this case, dotless i.

I reviewed the code where the exception raises:

    def replace_const_end_with_case(match):
        repl_rules = {
            'a': 'â', 'A': 'Â', 'á': 'â', 'Á': 'Â',
            'e': 'ê', 'E': 'Ê', 'é': 'ê', 'É': 'Ê',
            'i': 'î', 'I': 'Î', 'í': 'î', 'Í': 'Î',
            'o': 'ô', 'O': 'Ô', 'ó': 'ô', 'Ó': 'Ô',
            'u': 'û', 'U': 'Û', 'ú': 'û', 'Ú': 'Û'
        }

        word = match.group(0)
        prefix = match.group(1)
        suffix_vowel = match.group(2)
        suffix_const = match.group(3)

        else_cond = any(
            s in prefix
            for s in ('á', 'é', 'í', 'ó', 'ú', 'Á', 'É', 'Í', 'Ó', 'Ú'))

        if word.lower() in list(WORDEND_CONST_RULES_EXCEPT.keys()):
            return keep_case(word, WORDEND_CONST_RULES_EXCEPT[word.lower()])
        elif else_cond:
            return prefix + repl_rules[suffix_vowel]
        else:
            if suffix_const.isupper():
                return prefix + repl_rules[suffix_vowel] + 'H'
            else:
                return prefix + repl_rules[suffix_vowel] + 'h' # <--- EXACTLY HERE

I think that there are two choices to solve this:

Add more entries to the repl_rules map
Filter out words that contain characters that are not normally used in Spanish.

I'll check other kind of chars to learn more about this problem.

Jul 30 '20 09:07 FernanOrtega

andaluh-py andaluh-py copied to clipboard

Problem with dotless i

andaluh-py
andaluh-py copied to clipboard