mistral.rs
mistral.rs copied to clipboard
Unexpected CUDA out of memory for minimal example
Description
Loading a simple quantized GGUF model with CUDA fails with: Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
Environment
- Linux Mint 21.3 Virginia
- GeForce RTX 2060 SUPER (8 GiB VRAM)
- CUDA 12.6
When running the sample below, less than 1 GiB of VRAM was occupied.
Sample code
let model = GgufModelBuilder::new(
"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
vec![ "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"])
.build()
.await?;
Versions used
- Rust toolchain:
- stable-x86_64-unknown-linux-gnu (default)
- rustc 1.81.0 (eeb90cda1 2024-09-04)
- GIT hashes:
- mistralrs: 329e0e8c5a8403ed50ab829317df79c4823be80a
- mistralrs-core: 329e0e8c5a8403ed50ab829317df79c4823be80a