translate-shell icon indicating copy to clipboard operation
translate-shell copied to clipboard

Translating certain files (Japanese, converted from SHIFT-JIS to UTF-8) has issues

Open ijacquez opened this issue 4 years ago • 5 comments

Translating from Japanese (UTF-8) to English using default settings:

trans -from japanese -to english file:///path/to/file.html -o out.html

Results in out.html always having:

c=function

Is there a way to debug this?

ijacquez avatar Feb 07 '21 20:02 ijacquez

Could you please provide a minimal example file to reproduce the issue?

soimort avatar Feb 07 '21 20:02 soimort

This is the test HTML file I use: src.txt.

ijacquez avatar Feb 07 '21 21:02 ijacquez

The problem is not on Japanese or the encoding, but on the format of the file. translate-shell expects the input file to be plain text. Unfortunately, HTML format is not supported by translate-shell.

There is indeed a "Translate a web page" feature, but it is offered directly by Google Translate's web interface and requires a browser; it cannot be used to translate from local files to local files.

soimort avatar Feb 08 '21 15:02 soimort

Thanks. I did have issues with LC_ALL, and such when I was using SHIFT-JIS. It was when I converted to UTF-8 did the issue go way (a warning regarding my LC_ALL).

I went ahead and wrote a script that extracted out all the text from the HTML file, line by line. Attached is that file (out.txt).

If I do:

cat out.txt | trans -brief -s google -from japanese -to english > translated.txt

I get 156 newlines, which corresponds to the number of lines in out.txt.

out.txt

ijacquez avatar Feb 09 '21 02:02 ijacquez

Google banning frequent requests is a known issue here (#349).

soimort avatar Feb 09 '21 11:02 soimort