private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Multilanguage (translation) support + support for local OpenAI server realizations

Open janvarev opened this issue 2 years ago • 20 comments

Added support to translate request from UserLang to English and translate result back.

Support GoogleTranslator from deep_translator packages, and OneRingTranslator separate REST server.

Be DEFAULT, it turned OFF, of course

Params move to .env file, I keep it here for reference.

params = {
    'translator': os.environ.get('TRANSLATE_ENGINE',"GoogleTranslator"), # GoogleTranslator or OneRingTranslator.
    'custom_url': os.environ.get('TRANSLATE_CUSTOM_URL',"http://127.0.0.1:4990/"), # custom url for OneRingTranslator server
    'user_lang': os.environ.get('TRANSLATE_USER_LANG','en'), # user language two-letters code like "fr", "es" etc. "en" for NO translation
    'translate_user_input': (os.environ.get('TRANSLATE_USER_INPUT',"0") == "1"), # translate user input to EN
    'translate_system_output': (os.environ.get('TRANSLATE_SYSTEM_OUTPUT',"0") == "1"), # translate system output to UserLang
}

Also, there is better to use multilang embedder like https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 in case of multillang DB forming.

Added OpenAILocal LLM model https://github.com/oobabooga/text-generation-webui suppports running LLM models locally, and openai interface through plugin.

It's significangly faster call this interface then LLammacpp locally. But it's optional.

janvarev avatar May 20 '23 11:05 janvarev

OneRingTranslator project if someone needed: https://github.com/janvarev/OneRingTranslator

janvarev avatar May 20 '23 11:05 janvarev

The translation part feels like bloating the main script. I think it'd be much better if it was in a separate file/module.

maozdemir avatar May 20 '23 12:05 maozdemir

@maozdemir Ok, moved to separate translation.py

janvarev avatar May 20 '23 12:05 janvarev

If this is doing external http requests to Google or any other server, I'm quite a bit against this (totally against if it's default on). The point of this repo is about being 100% private and self-hosted.

In general, can't the user translate the result himself if he wants? Or use a model in a different language directly? This way he can choose the best tool/service for the job and we don't have to choose for them and add privacy nightmares to this repo.

PulpCattel avatar May 20 '23 12:05 PulpCattel

@PulpCattel It's default OFF, because by default we will not translate at all.

It CAN BE turned ON. Thus, user has two choices:

  • do translation by GoogleTranslate (by default, easier, google servers, yeah)
  • setup local translation server OneRingTranslator, and setup it to do everything offline (harder). OneRingTranslator allow to setup translation like you want, it support plugins for this, so it's a way to find some best tool for user.

In general, can't the user translate the result himself if he wants?

We don't need translate only results, we need to translate user input (context) too.

Or use a model in a different language directly?

This produce worse results in common case. As non-English speaker I want to do this, but they are really worse.

janvarev avatar May 20 '23 13:05 janvarev

If this is doing external http requests to Google or any other server, I'm quite a bit against this (totally against if it's default on). The point of this repo is about being 100% private and self-hosted.

In general, can't the user translate the result himself if he wants? Or use a model in a different language directly? This way he can choose the best tool/service for the job and we don't have to choose for them and add privacy nightmares to this repo.

Not all languages have a decent model, so honestly this would be a decent addition. Though yeah, I am not sure if this is a bloat or not...

Sure, privacy is a concern but so the quality is. This project actually can be used to research in one single area/topic by loading the related documents. I doubt that it'll be that useful, but still can produce some decent results on a good computer, with a decent model and set of documents.

IMO the decision should be left to the end user, yet I doubt that this solution is THE solution.

Running a local Web server just to translate is very much unneeded, and pretty much out of the scope of this project.

The user can rely on the googletrans package (which is way simpler than this), and even the offline package that is used on the proposed code is simpler to use, without running a local server. (https://argos-translate.readthedocs.io/en/latest/).

maozdemir avatar May 20 '23 13:05 maozdemir

@maozdemir By default we use GoogleTranslate, and no separate server.

I provide options for server because I'm developing it, and it can be customized by plugins - your example Argos in included in LibreTranslate; thera are plugin in OneRing to support LibreTranslate local or remote server if needed. So, it's just an option - if user want to adjust it.

Alternatively, you can add another code to translate just inside main_translator function in translator.py, adding another option if you are not satisfied with already existed.

janvarev avatar May 20 '23 13:05 janvarev

IMO the decision should be left to the end user, yet I doubt that this solution is THE solution.

I very much agree with this, but I thought this repo was about privacy. If the user wants a more convenient tool my idea was to just recommend some other software instead, or use other non-private tools on top. I also don't like to add dependencies that most people won't use at all, we could at least make these new dependencies entirely optional (e.g., a translate-requirements.txt file).

I didn't look at the code, so as you say is very much possible that it could be improved.

All that being said, if it's entirely optional and default off, and y'all really want this, I won't stand in the way.

PulpCattel avatar May 20 '23 13:05 PulpCattel

@maozdemir By default we use GoogleTranslate, and no separate server.

I provide options for server because I'm developing it, and it can be customized by plugins - your example Argos in included in LibreTranslate; thera are plugin in OneRing to support LibreTranslate local or remote server if needed. So, it's just an option - if user want to adjust it.

Alternatively, you can add another code to translate just inside main_translator function in translator.py, adding another option if you are not satisfied with already existed.

You'll still be running a local server for API, since your proposal does make API calls to localhost.

Also I doubt that running a local translation server along with this project is a good idea, considering the performance and limited resources of consumer level computers.

maozdemir avatar May 20 '23 13:05 maozdemir

One more word about OneRingTranslator:

I've made it because I didn't want to play with developers like "Oh, please, add Deepl translator to your software... Please, add Argos... FB NLLB..." and like that.

So, for software developers it's ONE point to translate - simple REST server. If user are not satisfied - it can customize it the way he want, but since that it's not the software developer problem (to support a lot of translator variations). You can answer "Please, setup your own plugin in OneRingTranslator and use it!"

Of course, it's OPTIONAL point, and user can quickly solve his problems just by Google Translate (by default and no adjustment).

janvarev avatar May 20 '23 13:05 janvarev

Also I doubt that running a local translation server along with this project is a good idea, considering the performance and limited resources of consumer level computers.

Any other ideas? We must to choose - if user want full privacy, he must setup local installation (it's not so heavy, really). Otherwise he can send his data to online server, I can't see how to bypass it.

janvarev avatar May 20 '23 13:05 janvarev

IMO the decision should be left to the end user, yet I doubt that this solution is THE solution.

I very much agree with this, but I thought this repo was about privacy. If the user wants a more convenient tool my idea was to just recommend some other software instead, or use other non-private tools on top. I also don't like to add dependencies that most people won't use at all, we could at least make these new dependencies entirely optional (e.g., a translate-requirements.txt file).

I didn't look at the code, so as you say is very much possible that it could be improved.

All that being said, if it's entirely optional and default off, and y'all really want this, I won't stand in the way.

Exactly, actually since the project itself says "this will run even if you are offline", any online stuff shouldn't be added. I aggree with you on that. However, the end user in other languages will be limited to the low quality models in that case.

So, your idea is actually a good one, adding optional dependencies that won't work offline.

But due to my comment(s) above, if we are sticking to offline only, the translation stuff is pretty much impossible.

maozdemir avatar May 20 '23 13:05 maozdemir

@maozdemir Ok, moved dependency to separate requirements-translate.txt

janvarev avatar May 20 '23 13:05 janvarev

@janvarev Shouldn't we instead propose an implementation that provides Google's T5 models with huggingface LLM as an option to choose to download models that understand various languages? I doubt that ChatGPT for example ping Google Translate when you type in your language

sime2408 avatar May 20 '23 15:05 sime2408

@janvarev See - as a developer I'd instead use a library that would handle the internet connection or run a language model in the background and return me the results, not a background server that'd require me to query AND will work under my software AND doing all of that while achieving the same in a much easier way is possible. I'm not insulting your work, just pointing out some facts.

Anyways, let's not go off-topic.

maozdemir avatar May 20 '23 16:05 maozdemir

Guys, thanks to everyone! I know, we can imagine better solution - offline, based on models etc. You are completely right. But for now question is: Is this solution good enough to approve it?

Solution can become better later. We can just add another option in main_translator function - there are IF's that select way to translate.

    if params['translator'] == "GoogleTranslator":
        from deep_translator import GoogleTranslator
        res = GoogleTranslator(source=from_lang, target=to_lang).translate(string)
    if params['translator'] == "OneRingTranslator":
        custom_url = params['custom_url']

If we will have another good solution, we can add it here, and set by default. IMHO it'll be easier later, when some solution have already been approved, and users can solve their tasks. Someone can provide better solution - and PR it.

So, the question is - is this good enough? (Because as for me, I now have a little time to dig it further)

janvarev avatar May 20 '23 16:05 janvarev

@maozdemir could you look into merging this PR please?

I’d love to try privategpt but I wish to use it with text-generation-webui instead of llama.cpp

LoopControl avatar May 28 '23 04:05 LoopControl

@maozdemir could you look into merging this PR please?

I’d love to try privategpt but I wish to use it with text-generation-webui instead of llama.cpp

I just have a thought that we should finally open the Discord channel for discussion on what PRs to review, which are prioritized, separate UI discussion, translation, GPU.... and what is the idea for future development. I think translation is a very good proposal though I would like LLM model to do it. @imartinez can you please create a channel?

sime2408 avatar May 28 '23 05:05 sime2408

Hey @sime2408 I totally agree. Discord will bring a more sync and rich communication which is great for discussions and agreements but also requires a big amount of attention, presence, moderation, and management in general. I'm getting everything ready to be able to provide that to the community.

I'll create a discussion in GitHub Discussion section to get more ideas regarding how to run it.

imartinez avatar May 28 '23 07:05 imartinez

but also requires a big amount of attention, presence, moderation, and management in general. I'm getting everything ready to be able to provide that to the community.

I can assist as an administrator if you would need help, there my name is orbita24, someone already created one where we can add you as an admin? privateGPT

sime2408 avatar May 28 '23 07:05 sime2408