gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Indexing gets stuck if filenames have square brackets

Open ThiloteE opened this issue 1 year ago • 1 comments
trafficstars

Bug Report

Indexing gets stuck if filenames have square brackets.

As mentioned in discord by the user "Synerdata": "When indexing text files with square brackets in their title it seems to clog the embedder which gets stuck on it and returns to 0% until the square brackets are changed to curved ones or removed." and "It was just stuck on them saying it was embedding but was not, and then when I changed the square brackets to curved in the filename it proceeded normally and embedded them."

Steps to Reproduce

  1. Try to use LocalDocs feature and embedd local files
  2. Have files with a filename that contains square brackets, such as [ or ]

Expected Behavior

Indexing should not get stuck. The file should get indexed.

Your Environment

  • GPT4All version: 3.0
  • Operating System: Unknown
  • Chat model used (if applicable): Unknown

ThiloteE avatar Jul 12 '24 17:07 ThiloteE

I've just tried that and it made embeddings for me. With GPT4All v3.0.1-dev0, current main: 6e0c0660. Windows 10, built locally.

I simply added square brackets to two test .txt files and made a collection with that. It created the database just fine.

Is there something else to consider?

cosmic-snow avatar Jul 12 '24 18:07 cosmic-snow

Maybe simply renaming the file in itself solved the "being stuck at indexing" issues. We do not know why it got stuck. Would need more feedback from user, maybe sample documents or error messages. Closing, because I personally am not willing to invest more time into this. Since the original user is not willing to create a github account, it is hard to follow up.

ThiloteE avatar Jul 19 '24 10:07 ThiloteE