Transliterate file names from non-Latin alphabets
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
- [x] I understand that I will be blocked if I intentionally remove or skip any mandatory* field
Checklist
- [x] I'm requesting a feature unrelated to a specific site
- [x] I've looked through the README
- [x] I've verified that I have updated yt-dlp to nightly or master (update instructions)
- [x] I've searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
- [x] I've read the guidelines for opening an issue
Provide a description that is worded well enough to be understood
It'd be nice if yt-dlp could transliterate file names from non-Latin alphabets/writing systems (Cyrillic, Georgian, Arabic, Amharic, Hebrew, Chinese, etc.) to Latin (and maybe also special letters used in Latin alphabets, such as the eszett, umlauts, etc.), would that be possible and practical?
Provide verbose output that clearly demonstrates the problem
- [ ] Run your yt-dlp command with -vU flag added (
yt-dlp -vU <your command line>) - [ ] If using API, add
'verbose': TruetoYoutubeDLparams instead - [ ] Copy the WHOLE output (starting with
[debug] Command-line config) and insert it below
Complete Verbose Output
... transliterate file names from non-Latin alphabets/writing systems (Cyrillic, Georgian, Arabic, Amharic, Hebrew, Chinese, etc.) to Latin (and maybe also special letters used in Latin alphabets, such as the eszett, umlauts, etc.), ...
Looks like a tough task to do: Challenges § Transliteration - Wikipedia.
... would that be possible and practical?
What's the reason? Did you encounter some bugs with the filesystem?
- I sometimes see question marks instead of the non-Latin characters, but only in some programs
- Some programs, such as some terminal emulators, don't play well with right-to-left text (Hebrew, etc.)
- I can't read Georgian, Arabic, Japanese, etc., so it'd be awesome if yt-dlp could do the transliteration automatically for me
I imagine some languages could be transliterated using character replacement
You've specifically requested Hebrew, but since no one bothers typing Hebrew vowels (niqqud) Hebrew would transliterate extremely poorly without a dictionary
This is not something that should be done in yt-dlp imo. After encoding the data into the filename, it can either be transformed from the filename or, maybe easier, the info json