gpt4-pdf-chatbot-langchain icon indicating copy to clipboard operation
gpt4-pdf-chatbot-langchain copied to clipboard

Does it work in other languages than english?

Open luisrock opened this issue 1 year ago • 7 comments

I am trying with a pdf in portuguese and the results are awful. I've updated the prompt, but still...

luisrock avatar Apr 04 '23 01:04 luisrock

I also encountered the same problem. I am trying to use the Chinese document, but the effect is not very good

yinshipeng avatar Apr 04 '23 05:04 yinshipeng

Can you describe it's behavior?

I translated the QA_PROMPT and CONDENSE_PROMPT to portuguese. Ingested some portuguese PDF and it's working as expected, answering in Portuguese.

However, it fails to connect one question to another, as if it stays stuck on the initial results and don't seem to "refresh" them. I don't think it is related to the translation.

stefansms avatar Apr 04 '23 10:04 stefansms

Well, my PDF has 200 pages. Must be a problem, right? The context (sources) are not at all related to the question. So I guess the problem is in the search for the right context to inject in the prompt

luisrock avatar Apr 04 '23 10:04 luisrock

I tried hard to make it work in Portuguese, but it didn't work :/ If you manage to make it work, please let me know

lucastzuka avatar Apr 05 '23 02:04 lucastzuka

@luisrock This may be true, but this limitation should not force the template to respond in English. In my case, I am using gpt3.5-turbo as model and have provided more than 2000 pages of PDF in Portuguese.

@lucastzuka I've just translated the prompt, as mentioned earlier. I also provided a lot of text in Portuguese.

stefansms avatar Apr 05 '23 10:04 stefansms

Again, the context injected is totally wrong. That is the main reason, I guess.

luisrock avatar Apr 05 '23 12:04 luisrock

@luisrock talvez de uma olhada se os pdf que vc ta usando nao estao protegidos. pra mim deu uns problemas de contexto no chat e fazendo perguntas especificas de cada documento percebi que um dos 3 pdf que eu tinha carregado estava protegido. outra coisa que vi que pode ser tambem é quando as paginas de texto do pdf estao convertidas em imagem.

@stefansms I translated the QA_PROMPT to Portuguese and added a line saying to give the answers in Portuguese. So it worked :) even when loading only documents in English. thanks s2

lucastzuka avatar Apr 05 '23 17:04 lucastzuka

same problem for turkish, embedded context is not related to question most of time

ahgsql avatar May 09 '23 21:05 ahgsql

I think this pull request may help: https://github.com/mayooear/gpt4-pdf-chatbot-langchain/pull/77

thiagopachecoit avatar Jun 19 '23 05:06 thiagopachecoit

Its not about answers from GPT, the problem is about to find related chunks

ahgsql avatar Jun 27 '23 21:06 ahgsql

Hi, @luisrock! I'm Dosu, and I'm here to help the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are experiencing poor results when using the tool with a PDF in Portuguese and you're questioning if the tool works with languages other than English. Other users, such as @yinshipeng and @lucastzuka, have also encountered similar issues with Chinese and Portuguese documents. It seems that @stefansms suggests that the problem may be related to the search for the right context to inject in the prompt. Additionally, @lucastzuka suggests checking if the PDFs being used are protected or if the text pages are converted into images. @ahgsql mentions a similar problem with Turkish, and @thiagopachecoit suggests a pull request that may help.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to the project!

dosubot[bot] avatar Sep 26 '23 16:09 dosubot[bot]