Perplexica icon indicating copy to clipboard operation
Perplexica copied to clipboard

Feature request: Authentication and multiple user support?

Open poisson-sg opened this issue 1 year ago • 7 comments

Dear Perplexica team:

Excellent work. Is there a plan to add authentication and multiple-user support so that the deployment can be used by different people with their own API keys?

Thanks!

poisson-sg avatar Jul 06 '24 15:07 poisson-sg

@poisson-sg You may run the service behind Authelia, which will add an auth layer (at least I do so), but this will still not solve your multi-user/session request, which will be of great value. On the other hand, given that GPU resources will likely be limited, I would recommend to reduce the model choice to avoid GPU OOM situations.

nirabo avatar Jul 09 '24 06:07 nirabo

@nirabo Can you go into how you integrated with Authelia?

fobtastic avatar Jul 30 '24 07:07 fobtastic

@fobtastic

@nirabo Can you go into how you integrated with Authelia?

I've followed the nginx-proxy-manager with authelia integration video here (https://www.youtube.com/watch?v=4UKOh3ssQSU) and started both in one docker composition. perplexica was running in its default docker composition (as per repo), alongside searXNG etc. I then created a new proxy instance in the proxy-manager dashboard following the instructions from the YT video and all went to work pretty nicely. Go have a check and let me know if you hit any underwater rocks.

nirabo avatar Jul 31 '24 11:07 nirabo

User management like in open-webui would be great.

crashr avatar Mar 20 '25 08:03 crashr

User management is no the same as authentication, you still expose the API keys and share the same chats among all the users. Having multiple users would be awesome

alangrafu avatar Mar 24 '25 10:03 alangrafu

On the other hand, given that GPU resources will likely be limited, I would recommend to reduce the model choice to avoid GPU OOM situations.

Providers that implement continuous batching like vLLM or SGLang can easily batch dozens of queries (default in vLLM is 256) in very small overhead in KV-cache size.

I have seen Ollama load one model per connection sometimes but in practice there is absolutely no need, it's basically upgrading a vector of for example 100K tokens (2 bytes per token at fp16, 50KB) to a matrix of batch-size x 100K (1MB for 20 batched query) and replacing all matrix-vector multiplications by matrix multiplications.

It's also a much better way to fully utilize the GPU because matrix-vector multiplication is memory-bound, matrix multiplication is compute-bound.

mratsim avatar Apr 12 '25 12:04 mratsim

+1 here. I'll set this up and see how it goes just for me inside my own network, but it would be amazing to be able to give each user their own instance using something like authentik.

dojoca avatar May 29 '25 03:05 dojoca