chatterbox icon indicating copy to clipboard operation
chatterbox copied to clipboard

Polish language has issues with saying letters like ś,ń,ż,ą,ę , maybe its something about the fonts decoding ?

Open trollver9000 opened this issue 3 months ago • 4 comments

The gradio demo has this issue, comfyui has it too, it just wont say the letters properly - it skips them or says them wrong way. It does sa letter ł so not all of them are wrong, but what is causing this ?

trollver9000 avatar Sep 07 '25 14:09 trollver9000

OK this code fixes it , its something about decomposing

0<0# : ^ ''' @echo off set script=%~f0 python -x "%script%" %* exit /b 0 ''' import tkinter as tk from tkinter import scrolledtext

Mapping of letters to decomposed forms

POLISH_DECOMPOSE_MAP = { "ą": "a\u0328", "ę": "e\u0328", "ć": "c\u0301", "ń": "n\u0301", "ś": "s\u0301", "ź": "z\u0301", "ż": "z\u0307", "ó": "o\u0301", "Ą": "A\u0328", "Ę": "E\u0328", "Ć": "C\u0301", "Ń": "N\u0301", "Ś": "S\u0301", "Ź": "Z\u0301", "Ż": "Z\u0307", "Ó": "O\u0301", # Ł/ł stays as-is }

def decompose_polish_text(text): """Replace all Polish special letters with decomposed forms""" return "".join(POLISH_DECOMPOSE_MAP.get(char, char) for char in text)

def on_decompose(): input_text = input_box.get("1.0", tk.END).rstrip() decomposed_text = decompose_polish_text(input_text) output_box.delete("1.0", tk.END) output_box.insert(tk.END, decomposed_text) root.clipboard_clear() root.clipboard_append(decomposed_text)

Setup GUI

root = tk.Tk() root.title("Polish Decomposer for Chatterbox TTS")

tk.Label(root, text="Input Polish text:").pack(pady=(10, 0)) input_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) input_box.pack(padx=10, pady=5)

decompose_button = tk.Button(root, text="DECOMPOSE", command=on_decompose) decompose_button.pack(pady=5)

tk.Label(root, text="Decomposed output:").pack(pady=(10, 0)) output_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) output_box.pack(padx=10, pady=5)

root.mainloop()

trollver9000 avatar Sep 07 '25 15:09 trollver9000

OK this code fixes it , its something about decomposing

0<0# : ^ ''' @echo off set script=%~f0 python -x "%script%" %* exit /b 0 ''' import tkinter as tk from tkinter import scrolledtext

Mapping of letters to decomposed forms

POLISH_DECOMPOSE_MAP = { "ą": "a\u0328", "ę": "e\u0328", "ć": "c\u0301", "ń": "n\u0301", "ś": "s\u0301", "ź": "z\u0301", "ż": "z\u0307", "ó": "o\u0301", "Ą": "A\u0328", "Ę": "E\u0328", "Ć": "C\u0301", "Ń": "N\u0301", "Ś": "S\u0301", "Ź": "Z\u0301", "Ż": "Z\u0307", "Ó": "O\u0301", # Ł/ł stays as-is }

def decompose_polish_text(text): """Replace all Polish special letters with decomposed forms""" return "".join(POLISH_DECOMPOSE_MAP.get(char, char) for char in text)

def on_decompose(): input_text = input_box.get("1.0", tk.END).rstrip() decomposed_text = decompose_polish_text(input_text) output_box.delete("1.0", tk.END) output_box.insert(tk.END, decomposed_text) root.clipboard_clear() root.clipboard_append(decomposed_text)

Setup GUI

root = tk.Tk() root.title("Polish Decomposer for Chatterbox TTS")

tk.Label(root, text="Input Polish text:").pack(pady=(10, 0)) input_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) input_box.pack(padx=10, pady=5)

decompose_button = tk.Button(root, text="DECOMPOSE", command=on_decompose) decompose_button.pack(pady=5)

tk.Label(root, text="Decomposed output:").pack(pady=(10, 0)) output_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) output_box.pack(padx=10, pady=5)

root.mainloop()

It's great that you found the solution! Also it has problems with the Greek language. Do you think you could find similar way to fix the Greek words also? The problem is that Greek sounds like mix of other languages, however it's cloning the voice right. Maybe they need more training for other languages than English? Thanks in advance and well done for finding a solution for the Polish language.

4ever-AI avatar Sep 07 '25 17:09 4ever-AI

I can confirm that applying: unicodedata.normalize('NFKD', prompt) to my prompt fixes polish outputs

msxpwr avatar Sep 18 '25 21:09 msxpwr

I can confirm that applying: unicodedata.normalize('NFKD', prompt) to my prompt fixes polish outputs

In fact that solves it for Swedish too. Good work @msxpwr

pardeike avatar Oct 24 '25 20:10 pardeike