Polish language has issues with saying letters like ś,ń,ż,ą,ę , maybe its something about the fonts decoding ?
The gradio demo has this issue, comfyui has it too, it just wont say the letters properly - it skips them or says them wrong way. It does sa letter ł so not all of them are wrong, but what is causing this ?
OK this code fixes it , its something about decomposing
0<0# : ^ ''' @echo off set script=%~f0 python -x "%script%" %* exit /b 0 ''' import tkinter as tk from tkinter import scrolledtext
Mapping of letters to decomposed forms
POLISH_DECOMPOSE_MAP = { "ą": "a\u0328", "ę": "e\u0328", "ć": "c\u0301", "ń": "n\u0301", "ś": "s\u0301", "ź": "z\u0301", "ż": "z\u0307", "ó": "o\u0301", "Ą": "A\u0328", "Ę": "E\u0328", "Ć": "C\u0301", "Ń": "N\u0301", "Ś": "S\u0301", "Ź": "Z\u0301", "Ż": "Z\u0307", "Ó": "O\u0301", # Ł/ł stays as-is }
def decompose_polish_text(text): """Replace all Polish special letters with decomposed forms""" return "".join(POLISH_DECOMPOSE_MAP.get(char, char) for char in text)
def on_decompose(): input_text = input_box.get("1.0", tk.END).rstrip() decomposed_text = decompose_polish_text(input_text) output_box.delete("1.0", tk.END) output_box.insert(tk.END, decomposed_text) root.clipboard_clear() root.clipboard_append(decomposed_text)
Setup GUI
root = tk.Tk() root.title("Polish Decomposer for Chatterbox TTS")
tk.Label(root, text="Input Polish text:").pack(pady=(10, 0)) input_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) input_box.pack(padx=10, pady=5)
decompose_button = tk.Button(root, text="DECOMPOSE", command=on_decompose) decompose_button.pack(pady=5)
tk.Label(root, text="Decomposed output:").pack(pady=(10, 0)) output_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) output_box.pack(padx=10, pady=5)
root.mainloop()
OK this code fixes it , its something about decomposing
0<0# : ^ ''' @echo off set script=%~f0 python -x "%script%" %* exit /b 0 ''' import tkinter as tk from tkinter import scrolledtext
Mapping of letters to decomposed forms
POLISH_DECOMPOSE_MAP = { "ą": "a\u0328", "ę": "e\u0328", "ć": "c\u0301", "ń": "n\u0301", "ś": "s\u0301", "ź": "z\u0301", "ż": "z\u0307", "ó": "o\u0301", "Ą": "A\u0328", "Ę": "E\u0328", "Ć": "C\u0301", "Ń": "N\u0301", "Ś": "S\u0301", "Ź": "Z\u0301", "Ż": "Z\u0307", "Ó": "O\u0301", # Ł/ł stays as-is }
def decompose_polish_text(text): """Replace all Polish special letters with decomposed forms""" return "".join(POLISH_DECOMPOSE_MAP.get(char, char) for char in text)
def on_decompose(): input_text = input_box.get("1.0", tk.END).rstrip() decomposed_text = decompose_polish_text(input_text) output_box.delete("1.0", tk.END) output_box.insert(tk.END, decomposed_text) root.clipboard_clear() root.clipboard_append(decomposed_text)
Setup GUI
root = tk.Tk() root.title("Polish Decomposer for Chatterbox TTS")
tk.Label(root, text="Input Polish text:").pack(pady=(10, 0)) input_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) input_box.pack(padx=10, pady=5)
decompose_button = tk.Button(root, text="DECOMPOSE", command=on_decompose) decompose_button.pack(pady=5)
tk.Label(root, text="Decomposed output:").pack(pady=(10, 0)) output_box = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=60, height=10) output_box.pack(padx=10, pady=5)
root.mainloop()
It's great that you found the solution! Also it has problems with the Greek language. Do you think you could find similar way to fix the Greek words also? The problem is that Greek sounds like mix of other languages, however it's cloning the voice right. Maybe they need more training for other languages than English? Thanks in advance and well done for finding a solution for the Polish language.
I can confirm that applying:
unicodedata.normalize('NFKD', prompt)
to my prompt fixes polish outputs
I can confirm that applying:
unicodedata.normalize('NFKD', prompt)to my prompt fixes polish outputs
In fact that solves it for Swedish too. Good work @msxpwr