cyrillic-transliteration icon indicating copy to clipboard operation
cyrillic-transliteration copied to clipboard

File and command line?

Open andradadad opened this issue 4 years ago • 10 comments

input a .txt and execute it through a command "CyrTranslit -i text.txt"

andradadad avatar Aug 11 '19 21:08 andradadad

Great idea @andradadad! I will look into this.

georgeslabreche avatar Oct 15 '19 11:10 georgeslabreche

Second this suggestion, a command line utility for transliteration would be great. Or may be you guys have heard of something like this already in existence?

097115 avatar Mar 18 '20 10:03 097115

Hey @andradadad and @097115, try out the following branch and let me know if it meets your requirements: https://github.com/opendatakosovo/cyrillic-transliteration/tree/command_exec

Instructions

Sample command line call to transliterate a Russian text file: python cyrtranslit.py -l RU -i tests/ru.txt -o tests/output.txt

Use the -c argument to accomplish the reverse, that is to input latin characters and output cyrillic.

Use the -h argument for help.

Issues

  • There is a potential scaling issue for large files because the entire input file is read before being processed.
  • I have not had the opportunity to thoroughly test nor write test cases, please report any issues or counter-intuitive behaviour that you may encounter.

georgeslabreche avatar Mar 30 '20 22:03 georgeslabreche

Seems to work OK with files, thanks (macOS 10.11, Python 3.7.4).

However, with pipes (like, echo "АаБбВв" | ...), not that much :) Any suggestions on this use case, may be?

(Also, speaking of Cyrillic letter Ъ, you seem transliterate it to #, which doesn't look like the common practice, is it OK?)

097115 avatar Mar 31 '20 07:03 097115

@097115 good catch. Pipes is now supported. Try this and let me know how it goes:

echo "АаБбВв" | python cyrtranslit.py -l RU

For the problem with Cyrillic letter Ъ, I will create another issue.

georgeslabreche avatar Apr 01 '20 22:04 georgeslabreche

Python 3.8

  File "C:\python36\Scripts\cyrillic-transliteration-command_exec\cyrtranslit.py
", line 69, in <module>
    text_input = args.input_file.read()
  File "C:\python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 190: char
acter maps to <undefined>

andradadad avatar Apr 02 '20 17:04 andradadad

@andradadad thank you for the thorough testing!

Current support is only up to Python 3.7: https://github.com/opendatakosovo/cyrillic-transliteration/pull/13

If lack of support for 3.8 is a blocker for you then I can address it as part of this Issue but it will take a bit of time. If not, then I can create another Issue and address it at later date so that I can focus on closing this Issue and merging it to master.

georgeslabreche avatar Apr 02 '20 17:04 georgeslabreche

Seems like working for me (macOS, Python 2.7, 3.7), thanks!

Also, when this makes its way to the master, will there be any binary included in the distribution, like a simple wrapper around this python /path/cyrtranslit.py?

097115 avatar Apr 02 '20 17:04 097115

@097115 I was thinking of just a shell script but maybe you have a better suggestion? I wouldn't do anything for the Windows environment though because I currently do not have a setup I could test it in.

georgeslabreche avatar Apr 02 '20 18:04 georgeslabreche

@georgeslabreche, yeah, pretty much it, a simple shell script. Just to have it so we could use it in pipes right after installing the package.

Thanks once again!

097115 avatar Apr 02 '20 18:04 097115