integreat-cms icon indicating copy to clipboard operation
integreat-cms copied to clipboard

Add glossary to DeepL API

Open ulliholtgrave opened this issue 2 years ago • 1 comments

Motivation

We want to add and use a glossary for certain words. This should be maintained via some private area/file.

Additional Context

See https://www.deepl.com/de/docs-api/managing-glossaries/list-language-pairs-supported-by-glossaries/

ulliholtgrave avatar Apr 06 '22 15:04 ulliholtgrave

We have decided that there is only one Integreat-wide glossary

dkehne avatar May 17 '22 13:05 dkehne

Is the goal that municipalities can add/edit glossary entries on their own?

Or should only staff roles be able to do that?

charludo avatar Nov 19 '22 14:11 charludo

This should only be done by the staff roles. However, I am not really sure about the way we want to provide it to them.

@osmers Can you bring this up in your team call and come up with an idea about how you want to edit this files?

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

ulliholtgrave avatar Nov 19 '22 15:11 ulliholtgrave

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

I would definitely prefer the database! :sweat_smile:

timobrembeck avatar Nov 19 '22 15:11 timobrembeck

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

I would definitely prefer the database! 😅

I definitely agree that the database would be the more ideal solution, but I am a little bit afraid of the amount of phrases we are supposed to show in the UI. If we end up with >200 entries with input, we really need some decent UI to manage this and a JSON in a TXT file might be the easier to implement and with "Strg+F" the directer way 🤷‍♂️

ulliholtgrave avatar Nov 19 '22 16:11 ulliholtgrave

We could implement a csv or json import/export? And we could intercept Ctrl + F to directly jump into our own search input field? :smile:

timobrembeck avatar Nov 19 '22 16:11 timobrembeck

I mean, in theory we can also outsource the storing of information to DeepL itself. And only query the stored entries from time to time and store them in our cache (if even necessary)...

timobrembeck avatar Nov 19 '22 16:11 timobrembeck

Ah ok, just read the API docs, and apparently glossaries can only be created and deleted, not modified. So modification only works via retrieving, then modifying locally, then deleting the old entry and uploading the new entry. And the API accepts entries via csv. So the most simple solution would probably be the following:

  • Add a simple csv upload, and just validate the file and pass the entries on to DeepL via the python client
  • List all existing glossaries in a simple UI (we don't store the information in our database, but just query it from DeepL each time and store it in our cache which we should invalidate each time we're creating or deleting a glossary)
  • Provide a simple "download" and "delete" functionality for existing glossaries

What do you think?

timobrembeck avatar Nov 19 '22 17:11 timobrembeck

Yes, I agree. That sounds good 👍

ulliholtgrave avatar Nov 19 '22 17:11 ulliholtgrave

I think we can currently download the glossary as an excel file - so we could provide that? Since we don't need to change it constantly an option to download, ammend and then upload again would be sufficient. Not sure if that is what Timo was refering to...

osmers avatar Nov 19 '22 17:11 osmers

Wait, we already have a DeepL Pro Advanced account? Then the basic functionality we're talking about here should already be offered by the DeepL web UI (see here)? I think there is no need to implement this in the CMS if we're just implementing exactly the same functionality as DeepL itself... :thinking:

timobrembeck avatar Nov 19 '22 18:11 timobrembeck

Not sure if we do - I assume so, yes, because otherwise we would not have enough translation budget. The glossary right now is implemented in MemoQ I will check DeepL for you - one sec.

osmers avatar Nov 19 '22 18:11 osmers

Indeed: Screenshot 2022-11-19 at 19-09-09 DeepL Translate – Der präziseste Übersetzer der Welt

timobrembeck avatar Nov 19 '22 18:11 timobrembeck

So you already checked our account? Then this should be easy enough, right?

osmers avatar Nov 19 '22 18:11 osmers

But it seems that glossaries don't work for most of our language pairs...

osmers avatar Nov 19 '22 18:11 osmers

image https://support.deepl.com/hc/en-us/articles/360021634540-About-the-glossary-feature

osmers avatar Nov 19 '22 18:11 osmers

Just English and French are possible...

osmers avatar Nov 19 '22 18:11 osmers

Oh, and I noticed another problem: we use two differrent accounts: the glossary can only be uploaded via the UI for the "DeepL Pro" account and we perform our automated translations with the "DeepL API Free" account. Probably, there is no complete trivial way of copying the glossaries... We could however ask the DeepL support whether it's possible to transfer glossaries between accounts, but probably they will refuse to do so.

So back to the drawing board, we probably need to copy the basic upload in our CMS to be able to pass the glossaries to the API account. But yes, let's talk about whether the effort is justified when only two languages are supported with German as source language...

timobrembeck avatar Nov 19 '22 18:11 timobrembeck

So if they don't support glossaries for more languages, it does not matter what we build into our system? Couldn't we still use it and somehow enforce certain translations? I don't know, maybe putting in alternatives for the word that DeepL provides if you translate just the word and tell the system or whatever that if it finds of one those, to replace it with ours from the glossary?

osmers avatar Nov 19 '22 18:11 osmers

Do you need any more input from our side on this?

osmers avatar Dec 01 '22 11:12 osmers

Do you need any more input from our side on this?

Probably yes: So as far as I understood it, we sadly cannot use the DeepL Pro account glossary for our DeepL API account requests. So this would be a bit of work to do, not sure if worth the effort if it can only be used for two languages.

So if they don't support glossaries for more languages, it does not matter what we build into our system? Couldn't we still use it and somehow enforce certain translations? I don't know, maybe putting in alternatives for the word that DeepL provides if you translate just the word and tell the system or whatever that if it finds of one those, to replace it with ours from the glossary?

You mean like completely implement our own glossary? This would definitely be a lot of effort. Maybe even more effort than having to manually fix machine translations in case potential glossary have been translated incorrectly. But yes, in theory it's doable.

timobrembeck avatar Dec 01 '22 12:12 timobrembeck

Probably yes: So as far as I understood it, we sadly cannot use the DeepL Pro account glossary for our DeepL API account requests. So this would be a bit of work to do, not sure if worth the effort if it can only be used for two languages

Dito - just for two languages it does not make sense - we would need to check the terms for all the languages we have a glossary for.

You mean like completely implement our own glossary? This would definitely be a lot of effort. Maybe even more effort than having to manually fix machine translations in case potential glossary have been translated incorrectly.

Not sure how feasible and realistic manual fixing is - but yes, that is essentially what I meant. But I can see how it is very difficult. Another idea I had was that we compile an alternative list of words, like you always have in dictionary suggestions (e.g. Straße can be road and street in English). So if we have this list, we can at least tell the system that if it finds and of those words, to replace it with the correct one from our glossary?

I am not sure if this is feasible though due to case declination of words (Dativ, Genitiv, etc Anpassung...)

osmers avatar Dec 01 '22 12:12 osmers

I am not sure if this is feasible though due to case declination of words (Dativ, Genitiv, etc Anpassung...)

Hmm, in my opinion we're opening Pandora's Box here :sweat_smile: I guess that's just one of the limitations of machine translations - there is always some margin of error which can either be accepted or fixed by humans. I doubt that any manual string replacement on our side is good enough to fix more problems than it causes. So at the moment, I'd suggest to put this on hold until DeepL supports more languages for the glossary - and as soon as this is the case, I think the effort for implementing support for DeepL's glossary mechanism is justified.

timobrembeck avatar Dec 01 '22 14:12 timobrembeck

Agreed!

osmers avatar Dec 01 '22 14:12 osmers

Just saw that DeepL now supports more languages - image

Question remains whether we can use it - would it help if we switched to the DeepL Pro Account to use the API and Glossary? Or are we by now using Pro anyways?

Edit: For DeepL API Free and DeepL API Pro subscribers

You can create glossaries with your DeepL API (Free and Pro) subscription. Please consult this article and our API documentation to learn how you can manage glossaries with the DeepL API.

If you use the DeepL API (Free and Pro) in third-party software, please note that plug-ins are not developed by DeepL SE. DeepL supports glossary functionality via the API, but your plug-in provider might require some time to implement this functionality in their plug-in. For more information, please contact the provider of your plug-in.

osmers avatar Oct 31 '23 17:10 osmers

@osmers good catch!

Question remains whether we can use it - would it help if we switched to the DeepL Pro Account to use the API and Glossary? Or are we by now using Pro anyways?

We only can use the API account for the CMS because it would be way to complicated to program any kind of interaction between the CMS and the DeepL UI – it only makes sense to interact via the API, which is only possible with an API account. Fortunately, this feature was enabled for the API as well, also with more language :tada: So I think this issue is no longer blocked and can be prioritized (although keep in mind that I estimate the effort to be high despite the new feature).

Supported Languages
In [1]: import deepl

In [2]: from django.conf import settings

In [3]: glossary_languages = deepl.Translator(settings.DEEPL_AUTH_KEY).get_glossary_languages()
Oct 31 17:11:34 INFO deepl - Request to DeepL API method=GET url=https://api-free.deepl.com/v2/glossary-language-pairs
Oct 31 17:11:34 INFO deepl - DeepL API response status_code=200 url=https://api-free.deepl.com/v2/glossary-language-pairs

In [4]: for language_pair in glossary_languages:
 ...:     print(f"{language_pair.source_lang} to {language_pair.target_lang}")
 ...:
de to en
de to es
de to fr
de to ja
de to it
de to pl
de to nl
de to zh
de to ru
de to pt
en to de
en to es
en to fr
en to ja
en to it
en to pl
en to nl
en to zh
en to ru
en to pt
es to de
es to en
es to fr
es to ja
es to it
es to pl
es to nl
es to zh
es to ru
es to pt
fr to de
fr to en
fr to es
fr to ja
fr to it
fr to pl
fr to nl
fr to zh
fr to ru
fr to pt
ja to de
ja to en
ja to es
ja to fr
ja to it
ja to pl
ja to nl
ja to zh
ja to ru
ja to pt
it to de
it to en
it to es
it to fr
it to ja
it to pl
it to nl
it to zh
it to ru
it to pt
pl to de
pl to en
pl to es
pl to fr
pl to ja
pl to it
pl to nl
pl to zh
pl to ru
pl to pt
nl to de
nl to en
nl to es
nl to fr
nl to ja
nl to it
nl to pl
nl to zh
nl to ru
nl to pt
zh to de
zh to en
zh to es
zh to fr
zh to ja
zh to it
zh to pl
zh to nl
zh to ru
zh to pt
ru to de
ru to en
ru to es
ru to fr
ru to ja
ru to it
ru to pl
ru to nl
ru to zh
ru to pt
pt to de
pt to en
pt to es
pt to fr
pt to ja
pt to it
pt to pl
pt to nl
pt to zh
pt to ru

timobrembeck avatar Oct 31 '23 17:10 timobrembeck

@timobrembeck yup, I found the info as well that we can use the API Free Account to implement the glossary :) nice!! It's something we need to do in order to make automatic translations better. So I think even though the effort is high, it is something we should do soonish :) like next quarter

osmers avatar Oct 31 '23 17:10 osmers

push to backlog. this is not as urgent as other tickets.

dkehne avatar Dec 11 '23 11:12 dkehne