ollama ollama can't support run wizardlm2:8x22b

What is the issue?

ollama run wizardlm2:8x22b Error: llama runner process no longer running: 1 error:failed to create context with model '/mnt/data1/ollama/models/blobs/sha256-cfcf93119280c4a10c1df57335bad341e000cabbc4faff125531d941a5b0befa'

+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:18:00.0 Off | Off | | 30% 30C P8 20W / 450W | 47MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 Off | 00000000:51:00.0 Off | Off | | 30% 27C P8 20W / 450W | 17MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

No response

Apr 28 '24 07:04 lizhanyang505

This happened to me when my VRAM is not sufficient, can you run mixtral 8x22b? or maybe you can monitor the VRAM usage while loading this model.

Apr 28 '24 14:04 helium729

Did run here on my good old 1070, make sure all is up2date on your side.

Apr 28 '24 19:04 itsXactlY

Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.

Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously.

May 01 '24 20:05 BruceMacD

Did run here on my good old 1070, make sure all is up2date on your side. The ollama version is the latest 1.3.2. It doesn’t work.

May 06 '24 09:05 lizhanyang505

Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.您好，这可能与我们计算运行混合模型所需的大小 ( #3836 ) 时的错误有关。如果是这种情况，下一个版本可能会解决您的问题。

Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously. Ollama v0.1.33 is support ?

May 06 '24 09:05 lizhanyang505

@lizhanyang505 it should work in v0.1.33 if you have enough VRAM:

❯ ollama run wizardlm2:8x22b
>>> hi
 Hello! How can I assist you today? If you have any questions or need information on a particular topic, feel free to ask. I'm here to help!

❯ ollama --version
ollama version is 0.1.33

Please let me know if the issue has resolved for you.

May 06 '24 21:05 BruceMacD

This should be solved in 0.1.33. Let me know if that's not the case!

May 09 '24 21:05 jmorganca

ollama ollama copied to clipboard

ollama can't support run wizardlm2:8x22b

What is the issue?

OS

GPU

CPU

Ollama version

ollama
ollama copied to clipboard