candle
candle copied to clipboard
DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
I am relatively new so I hope I am not just doing something very stupid :)
I am trying to adapt the quantized example for my use case. The inference code is pretty much the same as the example. In general, the code works and I am prompting 2 models on 2 separate GPUs in a loop. After N iterations (N is different every time but in range <100) I encounter the error below.
I am running quantized llama-3-8b-instruct from .gguf.
I would appreciate any tips on this topic if the error is on my side. Here is the access to the code.
NOTE: I'm running two A6000 GPUs. This is the nvcc version:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at /home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs/home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs::208208::7676:
:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at /home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs/home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs::208208::7676:
:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
0: 0x58c00bd19556 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h410d4c66be4e37f9
1: 0x58c00bd43550 - core::fmt::write::he40921d4802ce2ac
2: 0x58c00bd16d4f - std::io::Write::write_fmt::h5de5a4e7037c9b20
3: 0x58c00bd19334 - std::sys_common::backtrace::print::h11c067a88e3bdb22
4: 0x58c00bd1abb7 - std::panicking::default_hook::{{closure}}::h8c832ecb03fde8ea
5: 0x58c00bd1a919 - std::panicking::default_hook::h1633e272b4150cf3
6: 0x58c00bd1b048 - std::panicking::rust_panic_with_hook::hb164d19c0c1e71d4
7: 0x58c00bd1af22 - std::panicking::begin_panic_handler::{{closure}}::h0369088c533c20e9
8: 0x58c00bd19a56 - std::sys_common::backtrace::__rust_end_short_backtrace::hc11d910daf35ac2e
9: 0x58c00bd1ac74 - rust_begin_unwind
10: 0x58c00b9113d5 - core::panicking::panic_fmt::ha6effc2775a0749c
11: 0x58c00b911923 - core::result::unwrap_failed::ha188096f98826595
12: 0x58c00ba2b6c4 - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::h4c289e05ebd51ae6
13: 0x58c00ba2aafc - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<f32>>::hcbf6a15615cee068
14: 0x58c00ba2b1ca - alloc::sync::Arc<T,A>::drop_slow::h994a5bb01f1fc442
15: 0x58c00ba2af50 - alloc::sync::Arc<T,A>::drop_slow::h4a65dc7109aa30f1
16: 0x58c00ba1802a - candle_transformers::models::quantized_llama::ModelWeights::forward::had1312fe871968d8
17: 0x58c00b94121d - llm_bitcoin_inscription_analysis::llm::prompt::prompt_model::hbe917d2214140c60
18: 0x58c00b96e876 - core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut::h5f9d812f749ee289
19: 0x58c00b96b756 - rayon::iter::plumbing::Folder::consume_iter::h2c8efde69e0f7383
20: 0x58c00b971bfc - rayon::iter::plumbing::bridge_producer_consumer::helper::h814a881abff08b3e
21: 0x58c00b973006 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h8fb2eedfc5ec12fd
22: 0x58c00b90ce9f - rayon_core::registry::WorkerThread::wait_until_cold::hc0ea83de9f250620
23: 0x58c00bceaa32 - rayon_core::registry::ThreadBuilder::run::hedc5a5eddbc123f1
24: 0x58c00bcedbca - std::sys_common::backtrace::__rust_begin_short_backtrace::h14baabb9af848a11
25: 0x58c00bceeaef - core::ops::function::FnOnce::call_once{{vtable.shim}}::h49599ea7439698c3
26: 0x58c00bd1fb95 - std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6
27: 0x7b8d4de94ac3 - start_thread
at ./nptl/pthread_create.c:442:8
28: 0x7b8d4df26850 - __GI___clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
29: 0x0 - <unknown>
stack backtrace:
thread '<unnamed>' panicked at library/core/src/panicking.rs:163 : 5 :
0panic in a destructor during cleanup:
thread caused non-unwinding panic. aborting.
0x58c00bd19556 - <std::sys_common::backtraceAborted (core dumped)
I have encountered similar problems with quantized models. Running COMPUTE-SANITIZER gives this, except please infer a much larger backtrace...
========= COMPUTE-SANITIZER
Got device: Cuda(CudaDevice(DeviceId(1)))
Loading model contents..
Creating Model Weights
Got context length: 4096
Getting tokenizer
Device: Cuda(CudaDevice(DeviceId(1)))
Starting inferencing...
Getting tokens
Got 36644 tokens
parsing prompt tokens
Getting logits processor
getting first next_token
Device: Cuda(CudaDevice(DeviceId(1)))
inner: got input
========= Invalid __global__ write of size 1 bytes
========= at quantize_q8_1+0x560
========= by thread (192,0,0) in block (4,8195,0)
========= Address 0x3392b3015c is out of bounds
========= and is 29,021 bytes after the nearest allocation at 0x338e329000 of size 75,497,472 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x334660]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:cudarc::driver::sys::sys_12030::Lib::cuLaunchKernel::hfe263ae91520c126 in src/driver/sys/sys_12030.rs:15843 [0xedff84]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:cudarc::driver::result::launch_kernel::h7dc2536fdc655909 in "${HOME}"/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/result.rs:983 [0xb38613]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_core::quantized::cuda::quantize_q8_1::h9c8bbb2250b40525 in src/quantized/cuda.rs:59 [0xcc7b0d]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_core::quantized::cuda::mul_mat_via_q8_1::h7a6eaa861817963c in src/quantized/cuda.rs:320 [0xccdff1]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_core::quantized::cuda::QCudaStorage::dequantize_matmul::hece58c0f1fba08e1 in src/quantized/cuda.rs:551 [0xcd1d80]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_core::quantized::cuda::QCudaStorage::fwd::hb9af44186a442957 in src/quantized/cuda.rs:475 [0xcd09bf]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:_$LT$candle_core..quantized..QTensor$u20$as$u20$candle_core..custom_op..CustomOp1$GT$::cuda_fwd::h3dae5127f4b64a02 in src/quantized/mod.rs:522 [0xb3b32c]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_core::storage::Storage::apply_op1::hb1ae45780a7f5aac in src/storage.rs:203 [0xb0985d]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_core::custom_op::_$LT$impl$u20$candle_core..tensor..Tensor$GT$::apply_op1_no_bwd::hd843f14a3d52ec41 in src/custom_op.rs:157 [0xaf403c]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:_$LT$candle_core..quantized..QMatMul$u20$as$u20$candle_core..Module$GT$::forward::h733f98c2fe1f3d36 in src/quantized/mod.rs:529 [0xb3b3f5]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_transformers::models::quantized_llama::QMatMul::forward::h551d57b56deeba80 in src/models/quantized_llama.rs:27 [0x7149da]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_transformers::models::quantized_llama::LayerWeights::forward_attn::h56c8c3f144a68fc7 in src/models/quantized_llama.rs:173 [0x717e69]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:candle_transformers::models::quantized_llama::ModelWeights::forward::h0dfb5d857c758ce3 in src/models/quantized_llama.rs:476 [0x71c732]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:_$LT$ai_lib..models..llama_3_70b_instruct_32k_gguf..Llama3_70bInstruct32kGGUF$u20$as$u20$ai_lib..models..model_wrapper..ModelWrapper$GT$::inference::h279f36a3b9cc9fbf in ai-lib/src/models/llama_3_70b_instruct_32k_gguf.rs:189 [0x198cd2]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:cli::main::h8ee7a7979e775a0e in cli/src/main.rs:103 [0x19587a]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:core::ops::function::FnOnce::call_once::h772466b7bf645693 in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250 [0x194fcb]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:std::sys_common::backtrace::__rust_begin_short_backtrace::h8af7e217acbaa4da in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:161 [0x19540e]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h67443931d40186ff in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166 [0x194f71]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:std::rt::lang_start_internal::h103c42a9c4e95084 in library/std/src/rt.rs:148 [0x1b41a03]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:std::rt::lang_start::he3400f8001dc9f83 in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:165 [0x194f4a]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:main [0x195c7e]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
========= Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:58 [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main in ../csu/libc-start.c:379 [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0x194e45]
========= in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
We see this backtrace repeat again because we see successive threads writing to successively higher addresses:
========= Invalid __global__ write of size 1 bytes
========= at quantize_q8_1+0x560
========= by thread (223,0,0) in block (4,8195,0)
========= Address 0x3392b3017b is out of bounds
========= and is 29,052 bytes after the nearest allocation at 0x338e329000 of size 75,497,472 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x334660]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
...though, given we're talking about threads here, not always in a linear order:
========= Invalid __global__ write of size 1 bytes
========= at quantize_q8_1+0x560
========= by thread (0,0,0) in block (10,8194,0)
========= Address 0x3392b2e344 is out of bounds
========= and is 21,317 bytes after the nearest allocation at 0x338e329000 of size 75,497,472 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x334660]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
And then we keep going, because we eventually reach this:
thread 'main' panicked at "${HOME}"/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at "${HOME}"/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
stack backtrace:
0: 0x55b30644ed72 - std::backtrace_rs::backtrace::libunwind::trace::he4ee80166a02c846
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
Which is the last backtrace printed (and a different one! see https://github.com/coreylowman/cudarc/issues/277 for more on that) before we reach the "end":
thread 'main' panicked at library/core/src/panicking.rs:164:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 4835 errors
========= ERROR SUMMARY: 4735 errors were not printed. Use --print-limit option to adjust the number of printed errors