organisation icon indicating copy to clipboard operation
organisation copied to clipboard

Offline Web Page Translation?

Open dm17 opened this issue 5 years ago • 9 comments

How hard would it be to use Apertium's offline functionality for web page translation in Firefox & Chrome (especially on Android)? Currently, there seems no way to translate web pages on the fly unless your Android phone includes Google Services, unfortunately. I'm trying to find the easiest way to auto translate web pages - especially offline & on Linux desktop & on Android. Thanks!

dm17 avatar Nov 09 '20 11:11 dm17

Offline from a mobile browser extension? That's not exactly trivial. Hardest part is running all the code in JS. https://emscripten.org/ can help with that, but it's a lot of work. https://github.com/ftyers/attjs also started on a part of it. It's something that we have the expertise to do, but definitely not the time for.

An online browser extension would be much easier to make, and we have some code laying around for many parts of it - again, time is the main blocker. For now, if you're online you can just use https://apertium.org/ in any browser.

TinoDidriksen avatar Nov 09 '20 11:11 TinoDidriksen

The problem with using Google Translate or Apertium in browser is many of the websites I need to translate are because I'm buying a product or checking out at a restaurant. It seems like a bad idea to enter credit card information into a website that is sending potentially all of the text somewhere else to be translated... But perhaps I'm wrong.

The other feature I need to completely replace Google Translate is doing OCR on an image and then translating that text. Have you heard of anyone doing that?

dm17 avatar Nov 09 '20 12:11 dm17

As a stopgap for geeks on Linux, https://gist.github.com/unhammer/6900610 will let you select any text, hit a keyboard shortcut and show the translation, using locally installed apertium. (Not a solution for most people of course.)

unhammer avatar Nov 09 '20 12:11 unhammer

As a stopgap for geeks on Linux, https://gist.github.com/unhammer/6900610 will let you select any text, hit a keyboard shortcut and show the translation, using locally installed apertium. (Not a solution for most people of course.)

Sweet, I'll think about how this could be done on Android, if at all.

dm17 avatar Nov 09 '20 12:11 dm17

https://news.ycombinator.com/item?id=33792447 - some useful discussion for those researching how to implement offline translation... I'm closing this for now as I don't see apertium being a potential path for this goal.

dm17 avatar Nov 30 '22 08:11 dm17

(Reopening since we do want this, it's just that no one is currently working on it.)

unhammer avatar Nov 30 '22 09:11 unhammer

Emscripten looked easy enough to get running, so I tried it and managed to get CG-3 to build and run. Since CG-3 required that I figure out ICU (and Boost), all of our other tools should be buildable under Emscripten.

Javascript code, tested in both Chrome and Firefox:

{
	let cglb = Module.cwrap('cg3_grammar_load_buffer', 'number', ['string', 'number']); 
	let cac = Module.cwrap('cg3_applicator_create', 'number', ['number']);
	let crgotf = Module.cwrap('cg3_run_grammar_on_text_fns', null, ['number', 'string', 'string']);

	let g = cglb('DELIMITERS = "<.>"; SELECT (tag) ;', 'DELIMITERS = "<.>"; SELECT (tag) ;'.length);
	let a = cac(g);
	
	FS.writeFile('/tmp/input.txt', '"<woærd>"\n\t"woørd" tag\n\t"woård" nottag\n');
	crgotf(a, '/tmp/input.txt', '/tmp/output.txt');
	
	console.log(FS.readFile('/tmp/output.txt', {'encoding': 'utf8'}));
}

Yields output:

"<woærd>"
	"woørd" tag

So it is absolutely doable. Just need to take the time for it.

TinoDidriksen avatar Nov 30 '22 13:11 TinoDidriksen

https://github.com/apertium/wasm - I don't intend to do more with it atm. I've proven it can work, but there is no concrete application need yet.

TinoDidriksen avatar Dec 30 '22 07:12 TinoDidriksen

https://leaningtech.com/cheerp-3-0-the-most-advanced-c-compiler-for-the-web-now-permissively-licensed/

TinoDidriksen avatar Mar 14 '23 18:03 TinoDidriksen