m0wer comments

Results 21 comments of


m0wer

Update VRAM layer offloading to account for context size

> can we `/set parameter num_gpu 32` on runtime? it would save a lot of tries of "`ollama create [name] -f [modelfile]`". > > I'm using `litellm `and `autogen`, so...

Update VRAM layer offloading to account for context size

> I wonder if that has anything to do with this bug #1691 I don't think so. This bug occurs even on a fresh installation.

How to retrieve user_reviews?

Currently no reviews at all @darealdemayo . Thanks for the fix @arceushui !

Simpler dockerfile, but works: ```dockerfile #Dockerfile to build a pdf2htmlEx image FROM ubuntu:bionic RUN apt update && apt install wget -y RUN wget https://github.com/pdf2htmlEX/pdf2htmlEX/releases/download/v0.18.8.rc1/pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-bionic-x86_64.deb RUN apt install "./pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-bionic-x86_64.deb" -y VOLUME...

CUDA error 2 [...] out of memory when using mixtral:8x7b-instruct-v0.1-q3_K_M but not on bigger models

> Please re-open if you're still seeing the out-of-memory crash on 0.1.22 or newer. Still happens. It's not about this model in particular, just a matter of luck. But it...

CUDA error 2 [...] out of memory when using mixtral:8x7b-instruct-v0.1-q3_K_M but not on bigger models

> @m0wer can you share an updated server log and which model you're running? Sure! With 0.1.24 and two GPUs of different memery sizes (RTX 2060 GB and RTX 3090...

CUDA error 2 [...] out of memory when using mixtral:8x7b-instruct-v0.1-q3_K_M but not on bigger models

> hi @m0wer, really sorry about this. Are you still seeing this with the latest 0.1.28? Looks better now, with 0.1.29 it's not crashing.

How to serve multiple simultaneous request in Ollama?

Sounds like something https://github.com/vllm-project/vllm has sorted out (queuing + configurable number of workers).

How to serve multiple simultaneous request in Ollama?

> except vllm doesn't know how to run GGUF models and is very hungry in terms of memory consumption. Agreed. Would be great to have parallelism in Ollama instead.

☂️ make initial load of canvaskit faster

> Does anyone know if there is a way to render a loading animation during the initial load? You can try https://stackoverflow.com/a/66535945.