phalexo comments

Results 137 comments of


                                            phalexo

CUDA error 2: out of memory (for a 33 billion param model, but I have 39GB of VRAM available across 4 GPUs)

> > Likely a bug that was introduced into the later versions. Try 0.1.11 version. > > How would I revert to this version? I installed Ollama over a month...

CUDA error 2: out of memory (for a 33 billion param model, but I have 39GB of VRAM available across 4 GPUs)

@BruceMacD Looks like at least 3 people have been able to get rid of their OOM problems by reverting back to version 0.1.11. Clearly it is a bug when loading...

CUDA error 2: out of memory (for a 33 billion param model, but I have 39GB of VRAM available across 4 GPUs)

```bash git clone --recursive https://github.com/jmorganca/ollama.git cd ollama/llm/llama.cpp vi generate_linux.go ``` ```go //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake --build ggml/build/cuda --target server --config Release //go:generate...

CUDA error 2: out of memory (for a 33 billion param model, but I have 39GB of VRAM available across 4 GPUs)

Look at the docker files to see which version of Go to use. It may be your problem. On Tue, Jan 30, 2024 at 7:05 PM Davery92 ***@***.***> wrote: >...

CUDA error 2: out of memory (for a 33 billion param model, but I have 39GB of VRAM available across 4 GPUs)

I have 4 GPUs with 12.2GiB VRAM and 1 GPU with 4GiB, all five have 5.2 architecture. I used to be able to use all five, and ollama was smart...

[Enhancement]: Add Support for Open LLMs ..

> I cannot confirm this with gpt-pilot. For Gpt-Pilot what worked best now is still: nous-hermes 2 pro 7b gguf with Q8_0 Quant > > But very important: set n_batch...

[Enhancement]: Add Support for Open LLMs ..

Some people think that if you can connect to a local API it actually works. I just wanted to make certain that Mistral-7B v0.2 and Hermes 2 Pro "work" in...

[Enhancement]: Add Support for Open LLMs ..

What is your setup? Do you use ollama and define a system prompt for the models in the import file? Or do you use some litellm/ollama setup? Did you ollama...

[Enhancement]: Add Support for Open LLMs ..

As far as the custom adapter is concerned, you implemented a specific set of functions that gpt-pilot uses and is implemented by OpenAI API? Is the adapter part of gpt-pilot...

Question/Suggestion, Multi-GPU setups.

Models will necessarily grow, the ability to put multiple small models on different GPUs or splitting a large model across multiple GPUs is a necessity, not a luxury. How many...