translate-shell
translate-shell copied to clipboard
Translating certain files (Japanese, converted from SHIFT-JIS to UTF-8) has issues
Translating from Japanese (UTF-8) to English using default settings:
trans -from japanese -to english file:///path/to/file.html -o out.html
Results in out.html
always having:
c=function
Is there a way to debug this?
Could you please provide a minimal example file to reproduce the issue?
This is the test HTML file I use: src.txt.
The problem is not on Japanese or the encoding, but on the format of the file. translate-shell expects the input file to be plain text. Unfortunately, HTML format is not supported by translate-shell.
There is indeed a "Translate a web page" feature, but it is offered directly by Google Translate's web interface and requires a browser; it cannot be used to translate from local files to local files.
Thanks. I did have issues with LC_ALL
, and such when I was using SHIFT-JIS. It was when I converted to UTF-8 did the issue go way (a warning regarding my LC_ALL
).
I went ahead and wrote a script that extracted out all the text from the HTML file, line by line. Attached is that file (out.txt
).
If I do:
cat out.txt | trans -brief -s google -from japanese -to english > translated.txt
I get 156 newlines, which corresponds to the number of lines in out.txt
.
Google banning frequent requests is a known issue here (#349).