spreak icon indicating copy to clipboard operation
spreak copied to clipboard

auto trans

Open gedw99 opened this issue 2 years ago • 8 comments

Hey @vorlif

Try this :

https://translate.google.com/?sl=en&tl=de&text=%7B%0A%20%20%22%25%5B1%5Dd%20byte%22%3A%20%7B%0A%20%20%20%20%22one%22%3A%20%22%25%5B1%5Dd%20byte%22%2C%0A%20%20%20%20%22other%22%3A%20%22%25%5B1%5Dd%20bytes%22%0A%20%20%7D%2C%0A%20%20%22%25s%20GB%22%3A%20%22%25s%20GB%22%2C%0A%20%20%22%25s%20KB%22%3A%20%22%25s%20KB%22%2C%0A%20%20%22%25s%20MB%22%3A%20%22%25s%20MB%22%2C%0A%20%20%22%25s%20PB%22%3A%20%22%25s%20PB%22%2C%0A%20%20%22%25s%20TB%22%3A%20%22%25s%20TB%22%2C%0A%20%20%22%25vth_ordinal%2011%2C%2012%2C%2013%22%3A%20%7B%0A%20%20%20%20%22context%22%3A%20%22ordinal%2011%2C%2012%2C%2013%22%2C%0A%20%20%20%20%22other%22%3A%20%22%25vth%22%0A%20%20%7D%2C%0A%20%20%22%2C%20%22%3A%20%22%2C%20%22%2C%0A%20%20%22AM%22%3A%20%22AM%22%2C%0A%20%20%22PM%22%3A%20%22PM%22%2C%0A%20%20%22a.m.%22%3A%20%22a.m.%22%2C%0A%20%20%22midnight%22%3A%20%22midnight%22%2C%0A%20%20%22noon%22%3A%20%22noon%22%2C%0A%20%20%22p.m.%22%3A%20%22p.m.%22%2C%0A%20%20%22today%22%3A%20%22today%22%2C%0A%20%20%22tomorrow%22%3A%20%22tomorrow%22%2C%0A%20%20%22yesterday%22%3A%20%22yesterday%22%0A%7D%0A&op=translate

It fails because it also does the key works, so need an extractor and mergers just for this building machine translators.

 "midnight": "midnight",
„Mitternacht“: „Mitternacht“,
```.

So I was wondering if there is code for machine translation using any of the providers at all ?

The flow is that everything runs through machine translation and then for humans to also check as a 2nd phase.

gedw99 avatar Nov 09 '23 16:11 gedw99

Hi @gedw99,

there is no such thing for Spreak and there will not be in the future. The problem is that many languages have several plural categories. For example, English has two plural categories: One and Other. In contrast, Polish has four: One, Few, Many and Other. For machine translation, it is not easy to find out for which category you want a translation.

But I agree with you that it's an interesting idea. I am therefore currently working on redesigning the catalog processing and making it more usable for users. If you like, you can test this and write your own script. If you search for Go libraries for Google Translate, Deepl or similar, you will find a lot.

I have created a gist as a small template of how to start the process.

vorlif avatar Dec 04 '23 22:12 vorlif

yep I already use some of those libraries in golang.

yep I now the pluralisation problem.

If you get something working would love to contribute, as its a very common problem

this project is easily the best approach I have seen btw.

gedw99 avatar Dec 05 '23 12:12 gedw99

this is one of the plugins I wanted to integrate btw. You don't want to it in spreak plugins ?

Its a good one with caching because you get immediate translation but only if you get a cache miss so its good enough for most projects as you won't get rate limited due to the local caching

package main

import (
	"encoding/json"
	"fmt"

	gtranslate "github.com/gilang-as/google-translate"
)

func main() {
	value := gtranslate.Translate{
		Text: "Halo Dunia",
		//From: "id",
		To: "en",
	}
	translated, err := gtranslate.Translator(value)
	if err != nil {
		panic(err)
	} else {
		prettyJSON, err := json.MarshalIndent(translated, "", "\t")
		if err != nil {
			panic(err)
		}
		fmt.Println(string(prettyJSON))
	}
}

gedw99 avatar Dec 05 '23 12:12 gedw99

Hey @vorlif

let me know what you think about incorporating auto translation thing above and wider goals. Feel free to brainstorm with me.

gedw99 avatar Dec 06 '23 17:12 gedw99

Hi @gedw99,

I'm sorry, but I don't think that should be part of this library. There are too many tools/APIs for machine translation, and each user prefers a different one. So the library would have to support several, what I would like to avoid. Another problem is the different plural forms, which cannot be translated properly.

With the above library, legal aspects come into play that I don't want to deal with.

I see two options:

  1. For the translation of JSON files, each user can simply write a small script, similar to the Gist above, and decide for themselves which API they want to use and how they want to deal with the plural forms.
  2. For translations at runtime, you can simply write your own catalog. This could wrap the JSONCatalog and perform the translations at runtime.

If you create a catalog or script, I would be happy to link it in the README. But I don't think it should be part of the library itself.

Were you thinking more of a translation of files during development or a translation at runtime?

vorlif avatar Dec 06 '23 21:12 vorlif

Ok got it @vorlif

Really appreciate the feedback as I like this system a lot.

So the machine translation will be my repo and then I can rig things up for the Catalogue system uses my machine translator. I am not sure how just yet but thats the plan.

Is that cool ?

gedw99 avatar Dec 07 '23 14:12 gedw99

Hi @gedw99,

I'm glad to hear that.

I think that sounds like a good plan. If you have something, I'd be happy to link it in the README.

If you need any clarification about creating a catalog, I will be happy to assist you. Would you like to have machine translation performed during development or at runtime of your program?

vorlif avatar Dec 07 '23 16:12 vorlif

hey @vorlif

Will get back you when I have something to link to . Have no time right now though …

ml at dev time is is my thinking . So the code holds everything . This means that the cache is committed to a repo, which is fine as it’s just some json files. Not big and merge-able..

gedw99 avatar Dec 09 '23 20:12 gedw99