phalexo comments

Results 137 comments of


                                            phalexo

Can I distribute the stages over multiple GPUs? Like you see below

> Seems to work. > > Can I assume, you were also able to get an image out of it? Since they are using "accelerate" library already, I hoped it...

Can I distribute the stages over multiple GPUs? Like you see below

> Yes I can now confirm I was able to get images out of it with multi GPUs. > > Also you can change `t5 = T5Embedder(device="cpu")` to be a...

I have noticed something extremely strange about what ollama does with Phi-3 models.

This does sound interesting. If this is what is happening then there is a problem. I tried using phi3 models within an agent framework, and got gibberish output that looked...

Could the developers make an effort to make gpt-pilot work with at least ONE free open source LLM?

Any progress so far? On Thu, Apr 11, 2024 at 12:43 PM Soundmovin46 ***@***.***> wrote: > api really are to much I'm trying to uses groq > > — >...

Could the developers make an effort to make gpt-pilot work with at least ONE free open source LLM?

Rate limit for what? Is Groq access free? On Thu, Apr 11, 2024, 1:30 PM Soundmovin46 ***@***.***> wrote: > Any progress so far? > … > On Thu, Apr 11,...

Could the developers make an effort to make gpt-pilot work with at least ONE free open source LLM?

Thanks for letting me know. Time to move on to something else, more advanced and flexible. On Fri, Apr 19, 2024 at 7:21 AM techjeylabs ***@***.***> wrote: > hey there,...

Out of memory when using multiple GPUs

I get a similar error using multiple or a single GPU when the model is really too small for an OOM. The same models appear to work on the host....

Out of memory when using multiple GPUs

```bash git clone --recursive https://github.com/jmorganca/ollama.git cd ollama/llm/llama.cpp vi generate_linux.go ``` ```go //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake --build ggml/build/cuda --target server --config Release //go:generate...

Support for Phi-3 models

> template for llamacpp > > main.exe --model models/new3/Phi-3-mini-4k-instruct-fp16.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 0 --interactive -ins -ngl 99 --simple-io --in-prefix "\n" --in-suffix "\n" -p...

Support for Phi-3 models

With reduced context size of 60000 I can load a 128K model. The prompting is still messed up though. ./main --model /opt/data/pjh64/Phi-3-mini-128K-Instruct.gguf/phi-3-mini-128K-Instruct_q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty...