cyrillic-transliteration
cyrillic-transliteration copied to clipboard
File and command line?
input a .txt and execute it through a command "CyrTranslit -i text.txt"
Great idea @andradadad! I will look into this.
Second this suggestion, a command line utility for transliteration would be great. Or may be you guys have heard of something like this already in existence?
Hey @andradadad and @097115, try out the following branch and let me know if it meets your requirements: https://github.com/opendatakosovo/cyrillic-transliteration/tree/command_exec
Instructions
Sample command line call to transliterate a Russian text file:
python cyrtranslit.py -l RU -i tests/ru.txt -o tests/output.txt
Use the -c
argument to accomplish the reverse, that is to input latin characters and output cyrillic.
Use the -h
argument for help.
Issues
- There is a potential scaling issue for large files because the entire input file is read before being processed.
- I have not had the opportunity to thoroughly test nor write test cases, please report any issues or counter-intuitive behaviour that you may encounter.
Seems to work OK with files, thanks (macOS 10.11, Python 3.7.4).
However, with pipes (like, echo "АаБбВв" | ...
), not that much :) Any suggestions on this use case, may be?
(Also, speaking of Cyrillic letter Ъ
, you seem transliterate it to #
, which doesn't look like the common practice, is it OK?)
@097115 good catch. Pipes is now supported. Try this and let me know how it goes:
echo "АаБбВв" | python cyrtranslit.py -l RU
For the problem with Cyrillic letter Ъ, I will create another issue.
Python 3.8
File "C:\python36\Scripts\cyrillic-transliteration-command_exec\cyrtranslit.py
", line 69, in <module>
text_input = args.input_file.read()
File "C:\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 190: char
acter maps to <undefined>
@andradadad thank you for the thorough testing!
Current support is only up to Python 3.7: https://github.com/opendatakosovo/cyrillic-transliteration/pull/13
If lack of support for 3.8 is a blocker for you then I can address it as part of this Issue but it will take a bit of time. If not, then I can create another Issue and address it at later date so that I can focus on closing this Issue and merging it to master.
Seems like working for me (macOS, Python 2.7, 3.7), thanks!
Also, when this makes its way to the master, will there be any binary included in the distribution, like a simple wrapper around this python /path/cyrtranslit.py
?
@097115 I was thinking of just a shell script but maybe you have a better suggestion? I wouldn't do anything for the Windows environment though because I currently do not have a setup I could test it in.
@georgeslabreche, yeah, pretty much it, a simple shell script. Just to have it so we could use it in pipes right after installing the package.
Thanks once again!