apertium-html-tools icon indicating copy to clipboard operation
apertium-html-tools copied to clipboard

Bandwidth-efficient document translation

Open roybaer opened this issue 8 years ago • 3 comments

Hi! I just wanted to share the following idea:

Basics:

  • Odt and docx files are ZIP archives
  • Only the textual content is needed for translation
  • Files within ZIP archives are compressed individually

The following process could therefore (perhaps dramatically) reduce the network bandwidth required for document translation:

  1. Read the entire document file into RAM using javascript
  2. Copy the (compressed) file chunks corresponding to the textual content to a new data structure and attach a new ZIP header
  3. Submit the stripped-down (but still correct) ZIP file for translation
  4. Reintegrate the response into the original ZIP and update the header
  5. Put everything in a data URL and let the user save it to disk

No API-changes would be required, because the client-side script basically just strips unneeded content (e.g. pictures) from the ZIP (i.e. odt or docx) file and apertium does not care about those anyway.

roybaer avatar Feb 19 '16 16:02 roybaer