SurfSense icon indicating copy to clipboard operation
SurfSense copied to clipboard

Remove Hard Dependency of unstructured.io API key

Open Brandonagil opened this issue 6 months ago • 3 comments

They apparently no longer provide free access to API keys?!

Brandonagil avatar May 24 '25 21:05 Brandonagil

@Brandonagil Would something like LlamaParse work for your usecase. Its not open source though.

MODSetter avatar May 26 '25 20:05 MODSetter

@MODSetter There is Aryn DocParse, which uses the open source Sycamore project.

Matheus-Garbelini avatar May 29 '25 08:05 Matheus-Garbelini

@Brandonagil LMK if this https://github.com/MODSetter/SurfSense/pull/123 solves this issue for now?

MODSetter avatar May 31 '25 02:05 MODSetter

By the way - wouldn't it be easier to create your own function for parsing files instead of using the Unstructured/llama API? Unstructured is an open-source solution, so it would be enough to add a Python file that would handle processing uploaded files, and RAG for documents would work offline and for free without any limits

eyo4eh avatar Jun 09 '25 10:06 eyo4eh

By the way - wouldn't it be easier to create your own function for parsing files instead of using the Unstructured/llama API? Unstructured is an open-source solution, so it would be enough to add a Python file that would handle processing uploaded files, and RAG for documents would work offline and for free without any limits

Running the offline version of Unstructured is not an easy task. Moreover, most SurfSense users are not technically proficient enough to write their own code. However, you can still run it with the offline version of Unstructured by changing the base_url and not providing an API key. Docling support is planned to extend local ETL service support: https://github.com/MODSetter/SurfSense/issues/161

MODSetter avatar Jun 09 '25 18:06 MODSetter