llm-vscode-inference-server issues

Results 4 llm-vscode-inference-server issues

Sort by recently updated

trafficstars

Can't pip install requirements.txt on CPU-only system

I saw the README references running on CPU as a goal, is the project there right now or is there still work to be done to achieve that? Currently I'm...

ciprianelies

Running out of memory with TheBloke/CodeLlama-7B-AWQ

- Already posted on https://github.com/vllm-project/vllm/issues/1479 - My GPU is RTX 3060 with 12GB VRAM - My target model is[CodeLlama-7B-AWQ](https://huggingface.co/TheBloke/CodeLlama-7B-AWQ), which size is

bonuschild

Keeps responding back with tokens

I keep getting fim tokens when it responds back, am I supposed to scrub this directly in the code or is there some setting that has to be used in...

cmosguy

Issue on CUDA version and Torch on vllm

While i build a service with docker, the error is raised. **output** ```291.9 RuntimeError: 291.9 The detected CUDA version (12.1) mismatches the version that was used to compile 291.9 PyTorch...

sahussawud

llm-vscode-inference-server
llm-vscode-inference-server copied to clipboard

Metadata

Can't pip install requirements.txt on CPU-only system

Running out of memory with TheBloke/CodeLlama-7B-AWQ

Keeps responding back with tokens

Issue on CUDA version and Torch on vllm

← Metadata

Owner

Metadata

llm-vscode-inference-server llm-vscode-inference-server copied to clipboard

Metadata

Can't pip install requirements.txt on CPU-only system

Running out of memory with TheBloke/CodeLlama-7B-AWQ

Keeps responding back with tokens

Issue on CUDA version and Torch on vllm

← Metadata

Owner

Metadata

llm-vscode-inference-server
llm-vscode-inference-server copied to clipboard