llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

RFE: Accept file:// for URI for RAGDocument

Open dkennetzoracle opened this issue 4 months ago • 1 comments

🚀 Describe the new functionality needed

Hi llama-stack team, I built an application with a portal and a backend which processes PDFs, either via a URL or a file upload to the server.

The URL component works with a RAGDocument type, but when I try to pass it a file:// uri, I get an error:

route='/v1/tool-runtime/rag-tool/insert' method='post': Request URL is missing an 'http://' or 'https://' protocol.

This is a fairly common use case for teams doing document ingestion.

💡 Why is this needed? What if we don't build it?

This is needed because I now have 2 different methods for PDF ingestion - one by llama-stack and one hand-rolled method by me, which will result in differences in my vector database. It increases the scope and capability of what llama-stack can accomplish.

If you don't build it, people will need a workaround for something you largely have the capabilities for already.

Other thoughts

No response

dkennetzoracle avatar Sep 16 '25 20:09 dkennetzoracle

This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

github-actions[bot] avatar Nov 16 '25 00:11 github-actions[bot]

/assign This is a legitimate bug that creates confusion for users. The bug exists in two locations:

vector_store.py:144 - The content_from_doc() function uses a regex pattern that matches file://:

pattern = re.compile("^(https?://|file://|data:)") However, lines 148-152 then call httpx.AsyncClient().get() which fails for file:// URLs.

memory.py:63-68 - The raw_data_from_doc() function has the same issue - it handles data: URLs specially but passes file:// URLs directly to httpx.

Security wise consideration: Enabling file:// support introduces potential security risks that must be addressed:

Path Traversal Attacks: Malicious URIs like file:///etc/passwd or file:///../../../etc/shadow could expose sensitive system files Symlink Following: Files could be symlinks to sensitive locations Directory Listing: Requests for directories should be explicitly rejected Network Attacks: file:// URIs pointing to network shares could be exploited

r-bit-rry avatar Dec 02 '25 06:12 r-bit-rry