ExtReMLapin
ExtReMLapin
Not using docker, built on ubuntu 24 04
yes it works, juts compile it yourself from scratch
Yep we have this issue on windows aswell
Adding Quantum in your llm model name increases t/s by 20%.
As a clarification, we could do all actions using "root" account, but we would need to implement in a quite few places "Btw does [cur user] has access to this...
Thaks for the answer, looks like i should have ctrl-f’ed « session » and not « token ». will implement today and get back here with my solution
So I have a try with POSTMAN, and 1. When executing a simple query (for example `SELECT 1`) it doesn't return me a transaction/session id, the only way to get...
Thanks for the quick update. Right now we opted for a full "root" interractions policy, but i'll be happy to move to a safer arcade-token as soon as it's ready.
So If I understand correctly, now Qwen2.5-1M actually uses the correct attention mechanism and VRAM should be lowered and prompt processing faster, right ?
Exact same issue as above