CTranslate2
CTranslate2 copied to clipboard
CUDA DeviceAllocate segfault
#0 0x00007bc0622c6554 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#1 0x00007bc05573e59a in cub::CachingDeviceAllocator::DeviceAllocate(int, void**, unsigned long, CUstream_st*) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#2 0x00007bc05573ea99 in ctranslate2::cuda::CubCachingAllocator::allocate(unsigned long, int) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#3 0x00007bc055712796 in ctranslate2::StorageView::reserve(long) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#4 0x00007bc0557127f8 in ctranslate2::StorageView::resize(std::vector<long, std::allocator<long> >) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#5 0x00007bc0556f59f2 in void ctranslate2::ops::MatMul::compute<(ctranslate2::Device)1, float>(ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView&) const ()
from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#6 0x00007bc055660d24 in ctranslate2::layers::dot_product_attention(ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView const*, ctranslate2::StorageView const*, ctranslate2::StorageView const*, ctranslate2::StorageView const*, long, ctranslate2::StorageView&, ctranslate2::StorageView*, bool, float, bool, bool, long, ctranslate2::layers::Alibi*, ctranslate2::StorageView*) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#7 0x00007bc05566208d in ctranslate2::layers::MultiHeadAttention::operator()(ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView const*, ctranslate2::StorageView&, ctranslate2::StorageView*, ctranslate2::StorageView*, ctranslate2::StorageView*, ctranslate2::Padder const*, ctranslate2::Padder const*, bool, ctranslate2::StorageView*, long) const ()
from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
CT2_VERBOSE=3 LD_LIBRARY_PATH=/home/.local/lib/python3.10/site-packages/ctranslate2.libs whisper-ctranslate2 --language=en --verbose=true --model small -f srt --output_dir /tmp/ foo.mp4
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce 940MX Off | 00000000:01:00.0 Off | N/A |
| N/A 50C P8 N/A / 200W | 1988MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------
-
small
model (sadly) doesn't hold within my 2GB GPU but causes a segfault instead of failing properly. - happen with both the wheel and a hand-compiled
.so
-
tiny
model works (no OOM) - Important and unexpected useful workaround: Setting
CT2_CUDA_ALLOW_BF16=1 CT2_CUDA_ALLOW_FP16=1
I could getsmall
to run successfully on this GPU (!)