onyx questions about the project, db, chunks size, openai model

Hello, i'm trying the project and I'm curious about what database is it using and where it is deployed at? Many thanks

Aug 28 '23 00:08 jignnsd

Hi! Postgres (as relational DB) + Vespa (as vector DB). Up until recently though, we used a combination of Qdrant + Typesense instead of Vespa.

Everything is stored on your machine where you deploy Danswer. There are no call-home functionalities (meaning we never send any user data back to us, not even usage or telemetry data).

Aug 28 '23 00:08 yuhongsun96

Many thanks @yuhongsun96 , also, is there a way to change the size of the chunks and overlap? Also to change the openai model to use, gpt3.5, gpt3.5-16k, gpt4, etc, how can I do it? Many thanks

Aug 28 '23 01:08 jignnsd

To change the size of chunks and overlap you'd have to change the values here and build a new container: https://github.com/danswer-ai/danswer/blob/main/backend/danswer/configs/app_configs.py#L139

It's not configurable via environment variables because we don't recommend people mess with it. For example, if you increase the chunk size, you may start losing context in the embeddings because of the model context limit. But feel free to play with it!

For how to configure different models, you can check this: https://docs.danswer.dev/gen_ai_configs/open_ai

Aug 28 '23 04:08 yuhongsun96

Perfect @yuhongsun96 I'll read the info

Aug 28 '23 20:08 jignnsd

Hi! Postgres (as relational DB) + Vespa (as vector DB). Up until recently though, we used a combination of Qdrant + Typesense instead of Vespa.

Everything is stored on your machine where you deploy Danswer. There are no call-home functionalities (meaning we never send any user data back to us, not even usage or telemetry data).

Is there a reason you guys ditched Qdrant instead of vespa? What were the cons and pros?

Nov 10 '23 08:11 TaridaGeorge

Ya, we went to Vespa because they had features we needed that Qdrant didn't support:

multiple vectors per document
custom scoring functions allowing us to do time related decay, learning from feedback etc.

Nov 10 '23 18:11 yuhongsun96