llm-vscode-inference-server icon indicating copy to clipboard operation
llm-vscode-inference-server copied to clipboard

An endpoint server for efficiently serving quantized open-source LLMs for code.

Results 4 llm-vscode-inference-server issues
Sort by recently updated
recently updated
newest added
trafficstars

I saw the README references running on CPU as a goal, is the project there right now or is there still work to be done to achieve that? Currently I'm...

- Already posted on https://github.com/vllm-project/vllm/issues/1479 - My GPU is RTX 3060 with 12GB VRAM - My target model is[CodeLlama-7B-AWQ](https://huggingface.co/TheBloke/CodeLlama-7B-AWQ), which size is

I keep getting fim tokens when it responds back, am I supposed to scrub this directly in the code or is there some setting that has to be used in...

While i build a service with docker, the error is raised. **output** ```291.9 RuntimeError: 291.9 The detected CUDA version (12.1) mismatches the version that was used to compile 291.9 PyTorch...