mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Unexpected CUDA out of memory for minimal example

Open rwesterteiger opened this issue 4 months ago • 0 comments

Description

Loading a simple quantized GGUF model with CUDA fails with: Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")

log.txt

Environment

  • Linux Mint 21.3 Virginia
  • GeForce RTX 2060 SUPER (8 GiB VRAM)
  • CUDA 12.6

When running the sample below, less than 1 GiB of VRAM was occupied.

Sample code

    let model = GgufModelBuilder::new(
        "bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
        vec![ "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"])
    .build()
    .await?;

Versions used

  • Rust toolchain:
    • stable-x86_64-unknown-linux-gnu (default)
    • rustc 1.81.0 (eeb90cda1 2024-09-04)

Cargo.toml.txt

  • GIT hashes:
    • mistralrs: 329e0e8c5a8403ed50ab829317df79c4823be80a
    • mistralrs-core: 329e0e8c5a8403ed50ab829317df79c4823be80a

rwesterteiger avatar Oct 03 '24 03:10 rwesterteiger