ollama
                                
                                 ollama copied to clipboard
                                
                                    ollama copied to clipboard
                            
                            
                            
                        ollama can't support run wizardlm2:8x22b
What is the issue?
ollama run wizardlm2:8x22b Error: llama runner process no longer running: 1 error:failed to create context with model '/mnt/data1/ollama/models/blobs/sha256-cfcf93119280c4a10c1df57335bad341e000cabbc4faff125531d941a5b0befa'
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:18:00.0 Off | Off | | 30% 30C P8 20W / 450W | 47MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 Off | 00000000:51:00.0 Off | Off | | 30% 27C P8 20W / 450W | 17MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
OS
Linux
GPU
Nvidia
CPU
Intel
Ollama version
No response
This happened to me when my VRAM is not sufficient, can you run mixtral 8x22b? or maybe you can monitor the VRAM usage while loading this model.
Did run here on my good old 1070, make sure all is up2date on your side.
Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.
Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously.
Did run here on my good old 1070, make sure all is up2date on your side. The ollama version is the latest 1.3.2. It doesn’t work.
Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.您好,这可能与我们计算运行混合模型所需的大小 ( #3836 ) 时的错误有关。如果是这种情况,下一个版本可能会解决您的问题。
Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously.
Ollama v0.1.33 is support ?
@lizhanyang505 it should work in v0.1.33 if you have enough VRAM:
❯ ollama run wizardlm2:8x22b
>>> hi
 Hello! How can I assist you today? If you have any questions or need information on a particular topic, feel free to ask. I'm here to help!
❯ ollama --version
ollama version is 0.1.33
Please let me know if the issue has resolved for you.
This should be solved in 0.1.33. Let me know if that's not the case!