anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

[BUG]: How to upload a folder with subfolders with files to AnythingLLM?

Open venturaEffect opened this issue 1 year ago • 6 comments

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

Hi, I want to prepare all the folders with its files for another person on his AnythingLLM. I'm trying to find a folder where to upload this on my computer as it seems the AnythingLLM just doesn't allow to upload folders just file by file what is tedious and not very useful if trying to use a RAG in its real power. So I would appreciate if somebody knows where (PATH) to go to upload a folder with all the files for the workspace. And also not clear what type of files it accepts, the info modal where to updates it says "Click to upload or drag and drop. supports text files, csv's, spreadsheets, audio files, and more!", well what does this even mean? I try to upload a small .rtf file and it holds forever without success.

BTS, docs doesn't explain much. Would appreciate if you give a list of the files it accepts because it would be very useful to a lot of time.

Appreciate your time.

Are there known steps to reproduce?

No response

venturaEffect avatar Sep 06 '24 22:09 venturaEffect

Subdirectory support is definitely needed. Especially when working on code, the path is crucial due to the MVC structure. For example, even though both files are loaded, when asked to print the method of a route in the controller, the model cannot perform this task because the path of the file added to the vector database does not match the predicted path the model knows. Additionally, there are files with the same name. Therefore, adding the subdirectory feature would be great.

redkit75 avatar Sep 11 '24 09:09 redkit75

I'm glad I'm not the only one looking for a way to achieve this! 😅

I'd really love the ability to select a folder/directory to embed, rather than the existing file-by-file approach. My use-case is that I'd like to RAG on an Obsidian Vault, but I'd also like to do the same kind of thing with source code - basically any arbitrarily deep structure of subdirectories.

Please forgive my naivety - I'm a newcomer to AnythingLLM (and really liking it, thus far). Would the feature/improvement be as simple as implementing the following logic?

  1. Accept a Folder as an Upload target. (Alongside the existing File support.)
  2. Once a Folder is specified as an Upload target, traverse the descendant subdirectories recursively to discover all Files.
  3. Add all discovered Files to the My Documents collection (within AnythingLLM - perhaps as an appropriately named New Folder).

I imagine that this would then enable the user to select any/all of the discovered Files with a single click, and execute a single "Move to Workspace" operation, to have the Files added to the Workspace?

Again, total shot in the dark here - I don't mean to trivialise something that could be extraordinarily more complex to implement, in practice. 🙏🏻

Edit: I'm aware of the drag-drop approach for achieving much of what I've described above (it's referenced in this issue thread). The distinction I'd like to highlight is adding support via the File/Folder Picker interface, rather than via the drag-drop mechanism.

The ultimate outcome would be one that preserves the File paths on Upload - i.e. all the subdirectories are carried forward into the My Documents collection within AnythingLLM, rather than flattening out the whole file structure by placing all discovered Files together.

findyourexit avatar Jan 04 '25 23:01 findyourexit

@findyourexit Well put. This issue has been open for so long simply because it isn't exactly trivial but it's by no means an impossible feat or splitting the atom. It's just something that has the opportunity to really cause a mess in the file picker since it only supports 1 level deep folders right now.

The file picker is really the limitation and if we mess that up it causes a ton of problems downstream. If anyone is wondering why we are sitting on this, its not that it is not important or we don't care. Its just a lot more work than a cursory look would suggest.

timothycarambat avatar Jan 06 '25 16:01 timothycarambat

A new Data Connector “Local folder” is needed, it collects files recursive in the folder like the "GitHub Repo" Data Connector does. "GitHub Repo" Connector can create new folder for all documents and has relative path in the collected document file name. If is not possible to visit local folder in docker/browser, it can be added only into desktop version application.

Yuki001 avatar Jan 20 '25 02:01 Yuki001

I am sorry to ask, as a non-programmer. I am setting up my own Ultima Online Server and within creating new scripts and altering scripts with several LLMs like Claude and ChatGPT. But to give them the whole picture, which are the ServUO source files (280 MB, over 8,000 files), the ClassicUO source files (33 MB, 787 files) and the ServUO HTML documentation (67 MB, over 14,000 files). Yep, Big Data in small size. I am looking for a way to process this. Claude told me, use AnythingLLM.

TLDR: I searched for subfolders in AnythingLLM, that led me to this page here.

What is the current state of things? Would I be able to process a bigger dataset like this, or are there any other tools I need to consider. I actually want to finetune/teach the AI's (over API) to understand what each script is doing and what relationships they have to other scripts. It worked on a small base through the knowledge database, but the limited space and token context window limit is counterproductive at times.

workingmagic avatar Jan 20 '25 11:01 workingmagic

I would really like to see this feature, I need to include folder with multiple markdown files to my LLM.

GraniLuk avatar Feb 27 '25 07:02 GraniLuk

This would be a great feature to add to AnythingLLM!

While you are investigating and prioritizing this, is there a good workaround for the time being?

Does AnythingLLM (Desktop) have an API that we could use to upload documents? Than we could create our own scripts that use that API until this feature becomes part of the product.

Helpful links are appreciated.

Thanks!

Y-Sari avatar May 08 '25 07:05 Y-Sari

The important thing nowadays is to be able to give a good demo, not support actual user workflows.

That seems like the flaw of many projects nowadays. You only need a few flat files for a demo but real data lives in nested folders.

chadananda avatar Jun 07 '25 07:06 chadananda

The important thing nowadays is to be able to give a good demo, not support actual user workflows.

That seems like the flaw of many projects nowadays. You only need a few flat files for a demo but real data lives in nested folders.

The important thing nowadays is that people rather complain about open source software and point to how other people should work harder, instead of picking up the task by themselves.

This seems like a character flaw in many people nowadays. You only need so much intelligence to add a comment but the real effort is in starting to code.

raoulg avatar Jun 09 '25 10:06 raoulg

As we don't have this right now, we can leverage that AnythingLLM supports MCP and use mcp/filesystem. The only stubborn thing is having to use @agent when prompting

thiromi avatar Jun 16 '25 14:06 thiromi

Did this capability get added? In my project, Docker can see all subfolders and files.

However, the AnythingLLM web UI does not display subfolders when I click the arrow or double-click on the mounted folder.

I can only see files at the root level of the mount, not files inside subfolders

bogart99j avatar Nov 30 '25 08:11 bogart99j