yt-dlp icon indicating copy to clipboard operation
yt-dlp copied to clipboard

Transliterate file names from non-Latin alphabets

Open arisboch opened this issue 1 year ago • 4 comments

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • [x] I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Provide a description that is worded well enough to be understood

It'd be nice if yt-dlp could transliterate file names from non-Latin alphabets/writing systems (Cyrillic, Georgian, Arabic, Amharic, Hebrew, Chinese, etc.) to Latin (and maybe also special letters used in Latin alphabets, such as the eszett, umlauts, etc.), would that be possible and practical?

Provide verbose output that clearly demonstrates the problem

  • [ ] Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • [ ] If using API, add 'verbose': True to YoutubeDL params instead
  • [ ] Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output


arisboch avatar Feb 08 '25 14:02 arisboch

... transliterate file names from non-Latin alphabets/writing systems (Cyrillic, Georgian, Arabic, Amharic, Hebrew, Chinese, etc.) to Latin (and maybe also special letters used in Latin alphabets, such as the eszett, umlauts, etc.), ...

Looks like a tough task to do: Challenges § Transliteration - Wikipedia.

... would that be possible and practical?

What's the reason? Did you encounter some bugs with the filesystem?

pzhlkj6612 avatar Feb 08 '25 17:02 pzhlkj6612

  • I sometimes see question marks instead of the non-Latin characters, but only in some programs
  • Some programs, such as some terminal emulators, don't play well with right-to-left text (Hebrew, etc.)
  • I can't read Georgian, Arabic, Japanese, etc., so it'd be awesome if yt-dlp could do the transliteration automatically for me

arisboch avatar Feb 08 '25 17:02 arisboch

I imagine some languages could be transliterated using character replacement

You've specifically requested Hebrew, but since no one bothers typing Hebrew vowels (niqqud) Hebrew would transliterate extremely poorly without a dictionary

gamer191 avatar Feb 09 '25 14:02 gamer191

This is not something that should be done in yt-dlp imo. After encoding the data into the filename, it can either be transformed from the filename or, maybe easier, the info json

Grub4K avatar Feb 09 '25 16:02 Grub4K