argos-translate icon indicating copy to clipboard operation
argos-translate copied to clipboard

HTML codes appearing in plaintext on the commandline

Open bruceleerabbit opened this issue 2 years ago • 0 comments

Sometimes the argos-translate output contains a proper apostrophe (’) character, and sometimes it contains the HTML encoding for an apostrophe (“'”). HTML might be okay in the GUI (as it will be converted), but it makes no sense on the commandline where plaintext is expected.

Sample command:

argos-translate --from-lang en --to-lang fr "The explanation I gave to the ABC representative left no possibility for misinterpretation."

resulting output:

L ' explication que j ' ai donnée au représentant de l ' ABC n ' a laissé aucune possibilité d ' interprétation erronée.

It’s inconsistent, but reproducable. That is, the above sample will show HTML encodings every time, but different input text will sometimes show apostrophes as expected.

The HTML encodings are also surrounded with spaces, which is probably wrong even in HTML.

(update)

My workaround is to add this to shell scripts:

htmldec()
{
    sed 's/[ ]*[&]\([^;]*\);[ ]*/\&\1;/g' | perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'
}

Then pipe argos output through that. The ugly leading sed code removes the spaces surrounding the HTML entities and the PERL pipeline does the conversion into plaintext. The removal of spaces could have side-effects in the event that characters other than apostrophes come through. But then correcting that can really make for some ugly sed code because double quotes would need a space on one side and not the other, and this would vary between starting quote and ending quote. Another imperfection with this workaround that is that the PERL replaces ' with a straight single quote, whereas some of the text already contains a properly curved apostrophe (’), so in the end there is a mix of the two.

bruceleerabbit avatar May 19 '22 23:05 bruceleerabbit