No Support for Cyrillic characters in file names
Please add support for Cyrillic (unicode?) characters in file names. At this moment they are cuting off in file names. From title "Выполняем тестовое задание на Junior Python разработчика с зарплатой 70000р | PDF в MP3" trasform to file name - "2022-05-01_pythontoday_junior-python-70000-pdf-mp3_Q0lHb-FCATk_1080p-vp9-opus.mkv"
This is already on the to-do list due to another issue. Initially TubeSync allowed full Unicode filenames, however this caused some network shares that didn't fully support long Unicode filenames so I went the other way and stripped out everything that wasn't a-z, A-Z and 0-9 with a few hyphens etc. As you report this isn't useful for anything that has a title not in a Latin-based alphabet which is too conservative the other way. I am testing a few libraries to allow "safe" Unicode for filenames and will probably integrate one of those when this gets fixed. Thanks for the issue.
may be it will be good to use this: https://pypi.org/project/transliterate/
Bi-directional transliterator for Python. Transliterates (unicode) strings according to the rules specified in the language packs (source script <-> target script).
Comes with language packs for the following languages (listed in alphabetical order):
Armenian Bulgarian (beta) Georgian Greek Macedonian (alpha) Mongolian (alpha) Russian Serbian (alpha) Ukrainian (beta) There are also a number of useful tools included, such as:
Simple lorem ipsum generator, which allows lorem ipsum generation in the language chosen. Language detection for the text (if appropriate language pack is available). Slugify function for non-latin texts.
Thanks, I had seen a couple of libraries like that. It would certainly help specifically for Cyrillic languages I was generally hoping to find an off the shelf solution for all languages that makes filenames "safe" (e.g. Japanese or Thai etc.) and that can properly strip emojis. Currently a character allow list approach has been much easier than attempting a block list given how vast the Unicode space is. If need be I'll fall back to character packs for different languages, but this has to be a problem others have found solutions for already.