LocalAI Docker: no NVIDIA driver on your system

LocalAI version: 2.28.0

Environment, CPU architecture, OS, and Version: Docker OS: Ubuntu 24.10 CPU: AMD Ryzen 7 9800X3D GPU: RTX 5090

Describe the bug The docker image localai/localai:latest-aio-gpu-nvidia-cuda-12 failed to generate image with stablediffusion model throw the error:

failed to load model with internal loader: could not load model (no success): Unexpected err=RuntimeError('Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx'), type(err)=

To Reproduce try to generate an image with the stablediffusion model in the UI interface

Expected behavior No error

Logs

8:41AM DBG context local model name not found, setting to default defaultModelName=stablediffusion 8:41AM DBG Parameter Config: &{PredictionOptions:{BasicModelRequest:{Model:DreamShaper_8_pruned.safetensors} Language: Translate:false N:0 TopP:0xc00264c800 TopK:0xc00264c808 Temperature:0xc00264c870 Maxtokens:0xc00264ca30 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00264c9c8 TypicalP:0xc00264c9c0 Seed:0xc00264caf8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:stablediffusion F16:0xc00264c365 Threads:0xc00264c6e0 Debug:0xc002600738 Roles:map[] Embeddings:0xc00264caf1 Backend:diffusers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_IMAGE] KnownUsecases: PromptStrings:[xxxxxxxxxxxxxxxxxxxxxxx] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00264c8f8 MirostatTAU:0xc00264c8f0 Mirostat:0xc00264c878 NGPULayers:0xc00264ca38 MMap:0xc00264caf0 MMlock:0xc00264caf1 LowVRAM:0xc00264caf1 Grammar: StopWords:[] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00264cca0 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:true PipelineType:StableDiffusionPipeline SchedulerType:k_dpmpp_2m EnableParameters:negative_prompt,num_inference_steps IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:25 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[{Filename:DreamShaper_8_pruned.safetensors SHA256: URI:huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors}] Description: Usage:curl http://localhost:8080/v1/images/generations
-H "Content-Type: application/json"
-d '{ "prompt": "|", "step": 25, "size": "512x512" }' Options:[]} 8:41AM INF BackendLoader starting backend=diffusers modelID=stablediffusion o.model=DreamShaper_8_pruned.safetensors 8:41AM DBG Loading model in memory from file: /build/models/DreamShaper_8_pruned.safetensors 8:41AM DBG Loading Model stablediffusion with gRPC (file: /build/models/DreamShaper_8_pruned.safetensors) (backend: diffusers): {backendString:diffusers model:DreamShaper_8_pruned.safetensors modelID:stablediffusion assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0003cf808 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 8:41AM DBG Loading external backend: /build/backend/python/diffusers/run.sh 8:41AM DBG external backend is file: &{name:run.sh size:73 mode:448 modTime:{wall:0 ext:63879015916 loc:0x598eada0} sys:{Dev:69 Ino:29368129 Nlink:1 Mode:33216 Uid:0 Gid:0 X__pad0:0 Rdev:0 Size:73 Blksize:4096 Blocks:8 Atim:{Sec:1744791999 Nsec:809886729} Mtim:{Sec:1743419116 Nsec:0} Ctim:{Sec:1744791999 Nsec:808886738} X__unused:[0 0 0]}} 8:41AM DBG Loading GRPC Process: /build/backend/python/diffusers/run.sh 8:41AM DBG GRPC Service for stablediffusion will be running at: '127.0.0.1:44063' 8:41AM DBG GRPC Service state dir: /tmp/go-processmanager4159078518 8:41AM DBG GRPC Service Started 8:41AM DBG Wait for the service to start up 8:41AM DBG Options: ContextSize:1024 Seed:333832566 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:8 PipelineType:"StableDiffusionPipeline" SchedulerType:"k_dpmpp_2m" CUDA:true 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stdout Initializing libbackend for diffusers 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stdout virtualenv activated 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stdout activated virtualenv has been ensured 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.29.0 is exactly one major version older than the runtime version 6.30.2 at backend.proto. Please update the gencode to avoid compatibility violations in the next runtime release. 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr warnings.warn( 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr warnings.warn( 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Server started. Listening on: 127.0.0.1:44063 8:41AM DBG GRPC Service Ready 8:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00034ce58} sizeCache:0 unknownFields:[] Model:DreamShaper_8_pruned.safetensors ContextSize:1024 Seed:333832566 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/DreamShaper_8_pruned.safetensors Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType:StableDiffusionPipeline SchedulerType:k_dpmpp_2m CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]} 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Loading model DreamShaper_8_pruned.safetensors... 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Request Model: "DreamShaper_8_pruned.safetensors" 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ContextSize: 1024 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Seed: 333832566 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr NBatch: 512 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr F16Memory: true 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr MMap: true 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr NGPULayers: 99999999 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Threads: 8 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ModelFile: "/build/models/DreamShaper_8_pruned.safetensors" 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr PipelineType: "StableDiffusionPipeline" 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr SchedulerType: "k_dpmpp_2m" 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr CUDA: true 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ModelPath: "/build/models" 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 254902.45it/s] Loading pipeline components...: 0%| | 0/6 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel: 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ['text_model.embeddings.position_ids'] Loading pipeline components...: 100%|██████████| 6/6 [00:00<00:00, 50.85it/s] 8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . 8:41AM ERR Server error error="failed to load model with internal loader: could not load model (no success): Unexpected err=RuntimeError('Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx'), type(err)=<class 'RuntimeError'>"

Additional context

Notice that the chat correctly use GPU acceleration.

The docker compose configuration:

services:
  localai:
    image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    container_name: localai
    runtime: nvidia
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
    volumes:
      - /opt/openwebui/data_models:/build/models:cached
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Apr 16 '25 08:04 SuperPat45

@SuperPat45 was it working on 2.27.x?

Apr 16 '25 15:04 mudler

@SuperPat45 was it working on 2.27.x?

It was not working in the previous version...

Apr 17 '25 08:04 SuperPat45

This may be related to PyTorch 2.6.0 not compatible with CUDA 12.8 needed by RTX 5000: https://discuss.pytorch.org/t/pytorch-support-for-sm120/216099 I have similar error with the Reranked model of the Open WebUI CUDA and with Stable Diffusion web UI docker images Need to wait for the upcoming release.

Apr 22 '25 16:04 SuperPat45

PyTorch 2.7.0 released with NVIDIA Blackwell Architecture Support: https://github.com/pytorch/pytorch/releases/tag/v2.7.0

Time to upgrade!

Apr 24 '25 14:04 SuperPat45

Whereas the stablediffusion model still don't work with the latest docker image with another error (CUDA error: no kernel image is available for execution on the device), sd-3.5-medium-ggml and sd-3.5-large-ggml models are now well accelerated by the GPU, so I close the issue.

May 19 '25 17:05 SuperPat45

Sorry @SuperPat45 -- Is there a work around? New here.

Sep 01 '25 19:09 BadPirate

The only workaround is to use the GGML models version of Stable Diffusion 3.5 or Flux.1-dev from the gallery

Unfortunately, the GGML models version of Flux.1-schnell (for its compatible commercial license) and Stable Diffusion v3.5 Large Turbo (better performances) are missing from the gallery…

Sep 02 '25 10:09 SuperPat45