juice
juice copied to clipboard
cuda-memcheck: "Address ... is out of bounds"
Describe the bug
cuda-memcheck
reports scrolling errors on example-mnist-classification
like this:
========= Invalid __global__ write of size 4
========= at 0x00001780 in void copy_kernel<float>(cublasCopyParams<float>)
========= by thread (191,0,0) in block (0,0,0)
========= Address 0x7fd319043efc is out of bounds
To Reproduce
Steps to reproduce the behaviour:
-
cargo build
-
cuda-memcheck target/debug/example-mnist-classification mnist linear
Expected behavior
No errors.
Please complete the following information:
- System: Manjaro Linux
- Version: Git commit 854043d50bd
- Rust: rustc 1.62.1 (e092d0b6b 2022-07-16)
- Environment:
- Backends (if relevant):
- cuda:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 51C P8 5W / N/A | 4MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1936 G /usr/lib/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Additional context
Note that running example-mnist-classification
without cuda-memcheck
works just fine and is able to converge. I only discovered this while working on #159 where doing training with CUDA does crash with CUDA_ERROR_ILLEGAL_ADDRESS
when trying to copy from GPU to host. Not sure it's the same issue, but seems related.
Looks like the copy()
function for CUDA doesn't check that the destination is large enough (here).
By adding a check like this:
macro_rules! iblas_copy_for_cuda {
($t:ident) => {
fn copy(
&self,
x: &SharedTensor<$t>,
y: &mut SharedTensor<$t>,
) -> Result<(), ::coaster::error::Error> {
assert_eq!(x.desc().size(), y.desc().size());
We now get a panic:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `300`,
right: `10`', coaster-blas/src/frameworks/cuda/mod.rs:23:5
stack backtrace:
0: rust_begin_unwind
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181:5
4: coaster_blas::frameworks::cuda::<impl coaster_blas::plugin::Copy<f32> for coaster::backend::Backend<coaster::frameworks::cuda::Cuda>>::copy
at ./coaster-blas/src/frameworks/cuda/helper.rs:109:13
5: <juice::layers::common::linear::Linear as juice::layer::ComputeParametersGradient<f32,B>>::compute_parameters_gradient
at ./juice/src/layers/common/linear.rs:220:9
So we're trying to copy 300 floats into a tensor with size 10. And it happens in the Linear layer here