pachli-android
pachli-android copied to clipboard
(possible) Support privacy-preserving on-device translation for Google Play builds
Been looking into this. After building a working prototype a key problem is that MlKit translation does not play nicely with HTML.
E.g., pass in this content (random German post I found, https://berlin.social/@flostern/114217677861850069):
Original content (line breaks added here for display purposes):
<p><span class="h-card" translate="no"><a href="https://troet.cafe/@krischanski"
class="u-url mention" rel="nofollow noopener" target="_blank">@<span>krischanski</span></a></span>
Ich habe denen gleich mal geschrieben. Nicht nur, dass das teurer wird, ist doch gerade jetzt eine
Migration hin zu einem US-Unternehmen vollkommen daneben. Da kommt die Orange mit einem
Dekret um die Ecke und Schwupp gibt MS denen Zugriff auf unsere Maildaten. Nicht, dass Du was
Falsches über Dikator Trump geschrieben hast.</p>
Content after translation:
<p> <span class = "h-card" translate = "no"> <a href = "https://tret.cafe/@krischanski" Class =
"U-URL Mention" rel = "Nofollow Noopener" Target = "_ blank"> @ <span> Krischanski </ span>
</a> </ span> I have written those right now. Not only is that the more expensive, just now
a migration to an US company is completely wrong. There comes the orange with a decree
around the corner and Schwupp gives ms to those access to our mail data. Not that you
wrote something wrong about dicator trump. </ P>
Note:
- Spaces added around HTML tags and attribute key/value pairs
- Attribute values have been captalised in some cases
This is enough to break the display of posts.
I think Mastodon works around this by sending the plain text content to the translation service, and then marking up the results.
Subjectively, the off-device translation result is also better. At the time of writing LibreTranslate converts the original text to:
I wrote to them straight away. Not only will it be more expensive, but migrating to a US company right now is completely wrong. Orange comes around the corner with a decree and MS gives them access to our mail data. Not that you've written anything wrong about dictator Trump.
Google Translate does:
I wrote to them right away. Not only will it be more expensive, but migrating to a US company right now is completely wrong. Then Orange comes around the corner with a decree, and whoosh, MS gives them access to our email data. Not that you wrote anything wrong about dictator Trump.
So going to pause work on this for the moment.
I have a cunning plan to fix this. Not by the end of this month, but hopefully by end of June...
OK, it was by mid-August, but this is now in, with https://github.com/pachli/pachli-android/pull/1731