ExtReMLapin comments

Results 239 comments of


                                            ExtReMLapin

[Feature]: Support for RTX 5090 (CUDA 12.8)

Not using docker, built on ubuntu 24 04

[Feature]: Support for RTX 5090 (CUDA 12.8)

yes it works, juts compile it yourself from scratch

[Studio] History dropdown area bleeds out on large content

Yep we have this issue on windows aswell

Documentation fix: Quantum -> Quantized.

Adding Quantum in your llm model name increases t/s by 20%.

Get a authentification token instead of allowing only the login/password for queries

As a clarification, we could do all actions using "root" account, but we would need to implement in a quite few places "Btw does [cur user] has access to this...

Get a authentification token instead of allowing only the login/password for queries

Thaks for the answer, looks like i should have ctrl-f’ed « session » and not « token ». will implement today and get back here with my solution

Get a authentification token instead of allowing only the login/password for queries

So I have a try with POSTMAN, and 1. When executing a simple query (for example `SELECT 1`) it doesn't return me a transaction/session id, the only way to get...

Get a authentification token instead of allowing only the login/password for queries

Thanks for the quick update. Right now we opted for a full "root" interractions policy, but i'll be happy to move to a safer arcade-token as soon as it's ready.

Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support

So If I understand correctly, now Qwen2.5-1M actually uses the correct attention mechanism and VRAM should be lowered and prompt processing faster, right ?

Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support

Exact same issue as above