cpwan
cpwan
@dosu the original implementation considers only the semantic embedding case, yet there are now keyword search in the economical index method. The page number information should not store with the...
This one works. But does not allow user to toggle in chat config. ```python from pydantic import BaseModel, Field from typing import Optional class Filter: class Valves(BaseModel): enable_thinking: bool =...
> > > > > The "Think" or "Reason" toggles are becoming fairly standard on AI chat interfaces and apps alike. I think a toggle such as that (or potentially...
I have done some benchmark with [LLMPerf](https://github.com/ray-project/llmperf), with 150 requests of 1000 input tokens& 500 output tokens. (Cuda12.4. Nvidia Driver 550) |GPU Model |LLM model| vLLM version| 1 concurrent request...
**vLLM**: v0.8.5 **Model**: Qwen/Qwen3-30B-A3B **Hardware**: A10*4, 96GB VRAM Gives OOM, even i set max-model-len to 1024, with max-num-seq =1 Works with enforce-eager. Gives 20 token per seconds. logs ``` qwen3-1...
Alright, here is my suggested change in documentation for clarity: ```diff path : string, int, pathlib.Path, soundfile.SoundFile, audioread object, or file-like object path to the input file. Any codec supported...
So, it is **not** likely that librosa will support loading `webm` with file-like object. Right? 😢