onyx icon indicating copy to clipboard operation
onyx copied to clipboard

questions about the project, db, chunks size, openai model

Open jignnsd opened this issue 2 years ago • 6 comments

Hello, i'm trying the project and I'm curious about what database is it using and where it is deployed at? Many thanks

jignnsd avatar Aug 28 '23 00:08 jignnsd

Hi! Postgres (as relational DB) + Vespa (as vector DB). Up until recently though, we used a combination of Qdrant + Typesense instead of Vespa.

Everything is stored on your machine where you deploy Danswer. There are no call-home functionalities (meaning we never send any user data back to us, not even usage or telemetry data).

yuhongsun96 avatar Aug 28 '23 00:08 yuhongsun96

Many thanks @yuhongsun96 , also, is there a way to change the size of the chunks and overlap? Also to change the openai model to use, gpt3.5, gpt3.5-16k, gpt4, etc, how can I do it? Many thanks

jignnsd avatar Aug 28 '23 01:08 jignnsd

To change the size of chunks and overlap you'd have to change the values here and build a new container: https://github.com/danswer-ai/danswer/blob/main/backend/danswer/configs/app_configs.py#L139

It's not configurable via environment variables because we don't recommend people mess with it. For example, if you increase the chunk size, you may start losing context in the embeddings because of the model context limit. But feel free to play with it!

For how to configure different models, you can check this: https://docs.danswer.dev/gen_ai_configs/open_ai

yuhongsun96 avatar Aug 28 '23 04:08 yuhongsun96

Perfect @yuhongsun96 I'll read the info

jignnsd avatar Aug 28 '23 20:08 jignnsd

Hi! Postgres (as relational DB) + Vespa (as vector DB). Up until recently though, we used a combination of Qdrant + Typesense instead of Vespa.

Everything is stored on your machine where you deploy Danswer. There are no call-home functionalities (meaning we never send any user data back to us, not even usage or telemetry data).

Is there a reason you guys ditched Qdrant instead of vespa? What were the cons and pros?

TaridaGeorge avatar Nov 10 '23 08:11 TaridaGeorge

Ya, we went to Vespa because they had features we needed that Qdrant didn't support:

  • multiple vectors per document
  • custom scoring functions allowing us to do time related decay, learning from feedback etc.

yuhongsun96 avatar Nov 10 '23 18:11 yuhongsun96