lollms-webui
lollms-webui copied to clipboard
Non ASCII characters are stripped from prompt
Expected Behavior
When prompting with non ASCII character, all characters should be passed to the model.
Current Behavior
Non ASCII character are stripped from the prompt. For languages with accents (ex: French), this encourages the model to imitate text with stripped non-ASCII characters.
Steps to Reproduce
Please provide detailed steps to reproduce the issue.
- Provide a prompt or a personality instruction with accent
- Look in the logs for "Received message :" printed before generation starts
Possible Solution
unknown. Something happens before calling start_message_generation in api/__init__.py ?
Context
Tried to generate text with instructions in French.
The database contains the accents, the personality loads the accent well, but the content passed to start_message_generation is stripped from accents.
NOTE: installed in Docker
Screenshots
N/A
The problem is in the regex of the function clean_string. By changing the line to:
pattern = f'[^a-zA-Z0-9\u00C0-\u017F\s{re.escape(punctuation_chars)}]'
Accents are not removed and the model behaves correctly
I can create a pull request soon if you want ?
Thanks alot. Now it is fixed in V6.7alpha1