Gpt4All-webui icon indicating copy to clipboard operation
Gpt4All-webui copied to clipboard

Own dataset

Open PoojaAxis opened this issue 2 years ago • 5 comments

How to use our own data?

PoojaAxis avatar May 05 '23 08:05 PoojaAxis

Yes, that's possible, We didn't add this to our ui yet. You'll have to wait or do that using the old way with console, yaml confuiguration files and stuff.

You can do all that from the main repo gpt4all: https://github.com/nomic-ai/gpt4all

you have a train script to retrain with your own data

ParisNeo avatar May 05 '23 12:05 ParisNeo

I have a list of pdfs file on a folder, and would like the bot read from them and output the answers based on the user input. And I don’t know exactly what to modify from the app.py script.

PoojaAxis avatar May 05 '23 12:05 PoojaAxis

Well, reading pdfs will be handled by an extension. Just need to wait a little bit. You can simply copy manually text and place it in the discussion then ask questions for now. Or even create a personality that contains the text in the conditionning.

The main issue here is context size. The onctext is not big enough to read long text. That's a current limitation that may be fixed when Recurrent Transformers become main stream.

ParisNeo avatar May 05 '23 13:05 ParisNeo

Good Day!

Well what do you suggest if we were to upload all of the PDF files onto the database? Or import the contents of all the PDF files into the database?

PoojaAxis avatar May 08 '23 06:05 PoojaAxis

It is possible to do lora fine tuning of your model using a pdf database that you convert to text somehow. But keep in mind that you need to have discussion format, or at least have some examples that resembles a discussion about a text so you don't make the model loose its conversational capabilities. Then you can play on parameters.to adjudt how much of each stuff you want to use. All an exercice of balance.

ParisNeo avatar May 08 '23 06:05 ParisNeo