m0wer

Results 21 comments of m0wer

> can we `/set parameter num_gpu 32` on runtime? it would save a lot of tries of "`ollama create [name] -f [modelfile]`". > > I'm using `litellm `and `autogen`, so...

> I wonder if that has anything to do with this bug #1691 I don't think so. This bug occurs even on a fresh installation.

Currently no reviews at all @darealdemayo . Thanks for the fix @arceushui !

Simpler dockerfile, but works: ```dockerfile #Dockerfile to build a pdf2htmlEx image FROM ubuntu:bionic RUN apt update && apt install wget -y RUN wget https://github.com/pdf2htmlEX/pdf2htmlEX/releases/download/v0.18.8.rc1/pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-bionic-x86_64.deb RUN apt install "./pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-bionic-x86_64.deb" -y VOLUME...

> Please re-open if you're still seeing the out-of-memory crash on 0.1.22 or newer. Still happens. It's not about this model in particular, just a matter of luck. But it...

> @m0wer can you share an updated server log and which model you're running? Sure! With 0.1.24 and two GPUs of different memery sizes (RTX 2060 GB and RTX 3090...

> hi @m0wer, really sorry about this. Are you still seeing this with the latest 0.1.28? Looks better now, with 0.1.29 it's not crashing.

Sounds like something https://github.com/vllm-project/vllm has sorted out (queuing + configurable number of workers).

> except vllm doesn't know how to run GGUF models and is very hungry in terms of memory consumption. Agreed. Would be great to have parallelism in Ollama instead.

> Does anyone know if there is a way to render a loading animation during the initial load? You can try https://stackoverflow.com/a/66535945.