phalexo comments

Results 137 comments of


                                            phalexo

[Bug]: Unable To Use LiteLLM with Dolphin Coder

ollama now provides an OpenAI compatible API. So you don't need to use litellm any longer. That said, using local models comes with a different problem, i.e. they don't produce...

How to serve multiple simultaneous request in Ollama?

If you wanted to do the work, you could probably set up a process pool, which would take work off its own queue and then managing multiple ollama(s) running against...

finetune time

> I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs) I think the problem is that 'accelerate' although distributes weights to different...

AttributeError: 'Namespace' object has no attribute 'prompt_type'

p.s. It is NOT actually implementing anything. It is simply dishing out useless advice to do it yourself.

Does anyone run successfully with CPU only offline?

> I try to run the model with a CPU-only python driving file but unfortunately always got failure on making some attemps. And here is my adapted file: > >...

Cuda OOM during generate() call on 4 GPUs

You can create a callback and clear cache every now and then, and maybe do gc.collect(). To improve performance the allocator "refuses" to let cache memory go, i.e. an OOM....

Make `notus` model available on `ollama`

This is what I have in my notus_modelfile: FROM /opt/data/data/TheBloke/notus-7B-v1-GGUF/notus-7b-v1.Q6_K.gguf PARAMETER temperature 1 PARAMETER stop >>>>>>then you run ollama create notus -f notus_modelfile and then ollama run notus or litellm...

llama3:70b generating gibberish

Yes, it does exactly this. The only conjecture I have is overrunning its rather small context length of 8K. I have seen this many times using it with gpt-pilot, it...

llama3:70b generating gibberish

> Hey guys, it's happening when you hit the context size (which is set to 2048). You can increase the context as a work around w/ `/set parameter num_ctx 8192`...

CUDA error 2: out of memory (for a 33 billion param model, but I have 39GB of VRAM available across 4 GPUs)

Likely a bug that was introduced into the later versions. Try 0.1.11 version.