gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

LocalDocs Plugin no longer working

Open Giel111 opened this issue 1 year ago • 13 comments

System Info

GPT4all version v2.4.14

The localdocs plugin is no longer processing or analyzing my pdf files which I place in the referenced folder. I've tried creating new folders and adding them to the folder path, I've reused previously working folders, and I've reinstalled GPT4all a couple times. I've tried this on two pc's (win10 and win11) and with 4-5 models (Falcon, Wizard 1.1 and 1.2, LLama7B and LLama13B). I've also tried different advanced settings.

Previously, the LocalDocs plugin worked fine. after entering a prompt, gpt4all used to show "generating response"..."processing localdocsfolder"..."processing". now it skips the middle step, and I don't get any references or accurate responses based on the provided documents.

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] backend
  • [ ] bindings
  • [ ] python-bindings
  • [ ] chat-ui
  • [ ] models
  • [ ] circleci
  • [ ] docker
  • [ ] api

Reproduction

steps to reproduce the behaviour:

  1. create a folder in winexplorer and add to it files (pdf, word) with information
  2. reference this localdocs folder in settings
  3. ask a specific question of which you know the answer should be clearly formulated in the attached files.

Expected behavior

a concise answer from the LLM with accurate reference to the local files.

Giel111 avatar Aug 21 '23 15:08 Giel111

This may have some bearing on this issue. If not, please consider this for a new bug report.

I installed the latest version of GPT4All on a Windows 11 platform. Installed several models without issue.

However, in testing the software, I added a collections folder located on the D drive in LocalDocs. Assigned a name to the collection. Added the collection. But it didn't show up in the LocalDocs windows.

In trying to resolve this issue, I removed the collection and added a collection on the C drive. Everything worked correctly. The documents were referenced in the chat.

Went back to figure out why my first effort failed.

The answer:

D drive was added as "file:///D:{Path}" where Path is the location of the collection. But when the collection on C Drive is selected, it is entered as "C:{Path}".

So I reselected the collection on the D Drive using the Browse button. I edited the entry to remove "file:///".

Everything seemed to work. There seems to be an issue associated with the referencing of the drives.

DW413 avatar Sep 11 '23 03:09 DW413

It is not working for me on mac either. I wonder if this "feature" ever worked.

amr-cloudforce avatar Sep 16 '23 09:09 amr-cloudforce

LocalDocs not working on PopOS either.

PaulWeiss avatar Sep 17 '23 16:09 PaulWeiss

Updated GPT4All to v2.4.19 yesterday. The path issue, which I mentioned above, seems to be corrected. My current platform is an ASUS Vivobook Notebook/Tablet running Windows 11.

Now I can query collections on C and D drive.


Additional information:

It is easy to jump to a conclusion that something doesn't work based on the way we think something should work. Unfortunately, those creating software have a vision of how something should work that is different than the user's vision. That where documentation comes in and should fill the "expectation gap" between the user and the creator. GPT4All's documentation could be better.

But as a retired software engineer, allow me to provide you some guidance that might help you resolve your issue with GPT4All and productively use this tool.

Notes:

  1. Set up a test collection. Create a folder in your user's space. For Windows, that would be on C drive under your user account. Place 3 pdfs in this folder. The pdfs should be different but have some connection.
  2. Start up GPT4All, allowing it time to initialize. Once initialized, click on the configuration gear in the toolbar. Go to plugins, for collection name, enter Test. Browse to where you created you test collection and click on the folder. Select that folder. Now, add the folder using the Add button. You should see a line added with the collection name Test and the path to the folder. If this is not correct, remove the collection and redo it. Select the show references option. When everything looks correct, exit the dialog box.
  3. Select the LLM model you want to use. I started by using Falcon.
  4. Click on the database symbol and select your collect, Test.
  5. Type in a query specific about the topic in one of the pdfs in the Test collection. Enter query.
  6. Observe the response line. The busy indicator will be displayed and the word processing will be displayed followed by the word Test. This indicates that the response is being generated based on your collection. After the response is displayed, the reference showing the pdf will be displayed.
  7. At this point, you have confirmed that GPT4All recognizes your collection.
  8. Now enter another query. This time about something that is not in the collection. You should see the word processing displayed beside the busy indicator. Test will not be displayed. When the response is displayed there will not be a reference. GPT4All responded to your query using the knowledge base in the model you chose.
  9. These steps confirm normal operation of the Local Docs.

Clearly it is possible to have multiple collections, but I don't know if GPT4All can handle more than one collection per chat. When you create a new chat, the collection you link to that chat is always linked to that chat. In other words, starting a chat with one collection then deciding later to turn if off or add another collection may cause confusion for the AI model. If you exit a chat and return later, it will show the collection that was connected before.

Another point to remember is all the collection documents should be in the collection folder root. Don't use subfolders. GPT4All doesn't seem to handle them well. Also, in a collection, don't mix documents written in different languages.

I hope you find this information useful.

DW413 avatar Sep 18 '23 03:09 DW413

same.. I can add LocalDocs, but not all models use this.

syberx avatar Oct 09 '23 08:10 syberx

Updated GPT4All to v2.4.19 yesterday. The path issue, which I mentioned above, seems to be corrected. My current platform is an ASUS Vivobook Notebook/Tablet running Windows 11.

Now I can query collections on C and D drive.

Additional information:

It is easy to jump to a conclusion that something doesn't work based on the way we think something should work. Unfortunately, those creating software have a vision of how something should work that is different than the user's vision. That where documentation comes in and should fill the "expectation gap" between the user and the creator. GPT4All's documentation could be better.

But as a retired software engineer, allow me to provide you some guidance that might help you resolve your issue with GPT4All and productively use this tool.

Notes:

  1. Set up a test collection. Create a folder in your user's space. For Windows, that would be on C drive under your user account. Place 3 pdfs in this folder. The pdfs should be different but have some connection.
  2. Start up GPT4All, allowing it time to initialize. Once initialized, click on the configuration gear in the toolbar. Go to plugins, for collection name, enter Test. Browse to where you created you test collection and click on the folder. Select that folder. Now, add the folder using the Add button. You should see a line added with the collection name Test and the path to the folder. If this is not correct, remove the collection and redo it. Select the show references option. When everything looks correct, exit the dialog box.
  3. Select the LLM model you want to use. I started by using Falcon.
  4. Click on the database symbol and select your collect, Test.
  5. Type in a query specific about the topic in one of the pdfs in the Test collection. Enter query.
  6. Observe the response line. The busy indicator will be displayed and the word processing will be displayed followed by the word Test. This indicates that the response is being generated based on your collection. After the response is displayed, the reference showing the pdf will be displayed.
  7. At this point, you have confirmed that GPT4All recognizes your collection.
  8. Now enter another query. This time about something that is not in the collection. You should see the word processing displayed beside the busy indicator. Test will not be displayed. When the response is displayed there will not be a reference. GPT4All responded to your query using the knowledge base in the model you chose.
  9. These steps confirm normal operation of the Local Docs.

Clearly it is possible to have multiple collections, but I don't know if GPT4All can handle more than one collection per chat. When you create a new chat, the collection you link to that chat is always linked to that chat. In other words, starting a chat with one collection then deciding later to turn if off or add another collection may cause confusion for the AI model. If you exit a chat and return later, it will show the collection that was connected before.

Another point to remember is all the collection documents should be in the collection folder root. Don't use subfolders. GPT4All doesn't seem to handle them well. Also, in a collection, don't mix documents written in different languages.

I hope you find this information useful.

Thank you! It wasn't clear to me you needed to click the database symbol. I thought I had RTFM, but clearly not. Now all seems to be working with localdocs for me.

zos474 avatar Oct 25 '23 02:10 zos474

I am happy this worked for you. Hope it helps others.

An earlier comment mentioning that localDocs doesn't work with all models is worth investigating. I have only used localDocs with the Falcon model and not had a problem. However, the above procedure should enable anyone using a different model to test it out.

Reports of localDocs not working with other models should be reported as an issue to be investigated.

Good luck.

PS: I have used GPT4All in server mode and found that localDocs works with server mode, too.

DW413 avatar Oct 25 '23 02:10 DW413

1. Set up a test collection. Create a folder in your user's space. For Windows, that would be on C drive under your user account. Place 3 pdfs in this folder. The pdfs should be different but have some connection.

I am curious though, why 3? Does it not work with 1? Not trying to waste time with hypotheticals, I have books I need to query to find information faster and to summarize. I need to work with one pdf at a time.

mesbahulalam avatar Nov 27 '23 10:11 mesbahulalam

I chose 3 so that I could feel comfortable that GPT4All was accessing the localDocs directory. The 3 pdfs were on 3 different but related topics. You can use whatever number you want. Coming from an engineering background, I tend to over test so I'm confident something will meet my needs.

Keep in mind that it is important to ask a question that could be answered with information in the pdf(s) that are in the localDocs.

Hope that helps.

DW413 avatar Nov 27 '23 10:11 DW413

so i added docs in saw it indexed them as it should but it didnt retrive any

nevakrien avatar Dec 12 '23 15:12 nevakrien

tried zos474 solution but it dosent work for me. the model does not use the selected data I am sure the data is indexed so idk whats going on there

nevakrien avatar Dec 12 '23 15:12 nevakrien

The steps I gave above do work. If they are not working for you, you will need to verify that you followed the instructions. The instructions I gave are based on the GPT4All being installed on a Windows 10/11 pc. If you are using a different computer or OS, maybe there is an issue. However, I would expect the instructions to work on any system that can run GPT4All.

DW413 avatar Dec 12 '23 23:12 DW413

tried zos474 solution but it dosent work for me. the model does not use the selected data I am sure the data is indexed so idk whats going on there

Just to be clear, that was not my solution - I was just quoting DW413's. But I did want to make sure you actually checked the box next to your document set via the database symbol. That is the step I missed.

Database

zos474 avatar Dec 13 '23 02:12 zos474

If anyone's still having this issue, please make sure to hit the reload button after you've selected the documents, it fixed it for me, so hopefully it helps you. (Which is to say the button that looks like this:) image

Mvb1122 avatar Jan 29 '24 03:01 Mvb1122

Based on this comment https://github.com/nomic-ai/gpt4all/issues/1449#issuecomment-1947618498, you might want to try to delete localdocs_v1.db as well as the embeddings_v0.dat

ThiloteE avatar Feb 16 '24 01:02 ThiloteE

Also relevant: https://github.com/nomic-ai/gpt4all/issues/1958

ThiloteE avatar Feb 16 '24 02:02 ThiloteE