phalexo
phalexo
> Seems to work. > > Can I assume, you were also able to get an image out of it? Since they are using "accelerate" library already, I hoped it...
> Yes I can now confirm I was able to get images out of it with multi GPUs. > > Also you can change `t5 = T5Embedder(device="cpu")` to be a...
This does sound interesting. If this is what is happening then there is a problem. I tried using phi3 models within an agent framework, and got gibberish output that looked...
Any progress so far? On Thu, Apr 11, 2024 at 12:43 PM Soundmovin46 ***@***.***> wrote: > api really are to much I'm trying to uses groq > > — >...
Rate limit for what? Is Groq access free? On Thu, Apr 11, 2024, 1:30 PM Soundmovin46 ***@***.***> wrote: > Any progress so far? > … > On Thu, Apr 11,...
Thanks for letting me know. Time to move on to something else, more advanced and flexible. On Fri, Apr 19, 2024 at 7:21 AM techjeylabs ***@***.***> wrote: > hey there,...
I get a similar error using multiple or a single GPU when the model is really too small for an OOM. The same models appear to work on the host....
```bash git clone --recursive https://github.com/jmorganca/ollama.git cd ollama/llm/llama.cpp vi generate_linux.go ``` ```go //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake --build ggml/build/cuda --target server --config Release //go:generate...
> template for llamacpp > > main.exe --model models/new3/Phi-3-mini-4k-instruct-fp16.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 0 --interactive -ins -ngl 99 --simple-io --in-prefix "\n" --in-suffix "\n" -p...
With reduced context size of 60000 I can load a 128K model. The prompting is still messed up though. ./main --model /opt/data/pjh64/Phi-3-mini-128K-Instruct.gguf/phi-3-mini-128K-Instruct_q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty...