gpt4-pdf-chatbot-langchain icon indicating copy to clipboard operation
gpt4-pdf-chatbot-langchain copied to clipboard

Use it as translation bot

Open sdugoten opened this issue 1 year ago • 4 comments

First of all, thanks for creating the code that allow people to feed in PDF.

I was trying to change the prompt to something like

"`You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question. "

However, it seems the program do not have the concept of page number. When i try to tell the bot to translate page 1 into English, it will return some random page and do the translation. I wonder if this bot is able to work like a translator from some foreign language into English?

My ultimate goal is to feed in a foreign language pdf and it will translate into a English PDF that I can download.

Thanks.

sdugoten avatar Mar 20 '23 20:03 sdugoten

Hi, thanks for the feedback.

Based on what you're saying the translation works well, but not for the page you want? What language is this?

I will add this feature of page numbers as a PR soon.

mayooear avatar Mar 21 '23 21:03 mayooear

It's a Japanese light novel. You can try that here https://ufile.io/10rqqw7j

Basically, I feed the PDF into chatbot, and then have the prompt setup like "You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"

Then, I asked the question, "Can you translate page 1 of the PDF into English". The bot will translate some random page out of the PDF. If you try to ask chatbot to translate the whole PDF into English, it wont' work as well.

sdugoten avatar Mar 22 '23 12:03 sdugoten

Generally openai's embeddings aren't great for multilingual.

If you ask it to translate from English to Japanese how is the performance?

mayooear avatar Mar 22 '23 17:03 mayooear

Generally openai's embeddings aren't great for multilingual.

If you ask it to translate from English to Japanese how is the performance?

You can use your provided court case PDF to test using my prompt.

"You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"

Using your provided PDF as an example, even if you specifically ask GPT to translate page 1 of the PDF, It will still pick one random page from the PDF and translate it into whatever language you asked. That's why I said it seems like it doesn't have the concept of page. It looks like it's not about multilingual, it's about how to explain to GPT , he has to understand page number, and able to pin point exactly the page that we refer to and use that as input for translation.

perhaps , you can add some debug coding on the result so that we can know which page GPT is currently looking at when we ask the question.

sdugoten avatar Mar 22 '23 17:03 sdugoten

There is no concept of page because the chunks are currently split by character count. I will add a PR later to split the PDF docs by page number later.

mayooear avatar Mar 23 '23 00:03 mayooear