mistral.rs
mistral.rs copied to clipboard

Published 20 hours ago •

Reame
Issues

Unexpected CUDA out of memory for minimal example

Open rwesterteiger opened this issue 4 months ago • 0 comments

Description

Loading a simple quantized GGUF model with CUDA fails with: Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")

Environment

Linux Mint 21.3 Virginia
GeForce RTX 2060 SUPER (8 GiB VRAM)
CUDA 12.6

When running the sample below, less than 1 GiB of VRAM was occupied.

Sample code

    let model = GgufModelBuilder::new(
        "bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
        vec![ "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"])
    .build()
    .await?;

Versions used

Rust toolchain:
- stable-x86_64-unknown-linux-gnu (default)
- rustc 1.81.0 (eeb90cda1 2024-09-04)

GIT hashes:
- mistralrs: 329e0e8c5a8403ed50ab829317df79c4823be80a
- mistralrs-core: 329e0e8c5a8403ed50ab829317df79c4823be80a

Oct 03 '24 03:10 rwesterteiger