anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

[BUG]: File Parsing Fails for URLs Without Explicit File Extensions

Open angelplusultra opened this issue 2 months ago • 1 comments

How are you running AnythingLLM?

All versions

What happened?

When attempting to pull and parse a file using either the RAG Modal or @agent mode, the process fails if the URL does not explicitly end with a file extension (e.g., .pdf, .csv). This occurs even when the server responds with a correct Content-Type header that identifies the file type.

Example:

The following URL fails to be processed, despite responding with an application/pdf content type:

https://arxiv.org/pdf/2307.10265

Observed Behavior:

The application logs display the following error:


[2] Error processing single file File extension .10265 not supported for parsing and cannot be assumed as text file type.

This error originates from the file extension guard located at:

https://github.com/Mintplex-Labs/anything-llm/blob/89a01492b51a23150b59732166b90ebdd1843c50/collector/processSingleFile/index.js#L58-L72

Expected Behavior:

The system should be able to successfully pull and parse files from URLs that do not explicitly contain a file extension, provided the Content-Type header in the server's response clearly indicates the file's MIME type.


Are there known steps to reproduce?

No response

angelplusultra avatar Oct 08 '25 19:10 angelplusultra

I would like to work on this @timothycarambat

Guru6163 avatar Oct 09 '25 01:10 Guru6163