[BUG]: Randomly recurring test_cufile.py::test_get_stats_l3 Segmentation faults
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
cuda.bindings
Describe the bug
In routine testing on a bare-metal (NOT WSL) Ubuntu 24.04 linux-64 workstation I'm seeing randomly recurring test_cufile.py::test_get_stats_l3 Segmentation faults, e.g.:
smc120-0004.ipp2a2.colossus.nvidia.com:/wrk/logs $ grep -a Segmentation *
qa_bindings_linux_2025-12-04+171342_tests_log.txt:Fatal Python error: Segmentation fault
qa_bindings_linux_2025-12-04+171342_tests_log.txt:../ctk-next/qa/13.1.0/qa_bindings_linux_tests.sh: line 60: 5457 Segmentation fault (core dumped) python -m pytest -ra -s -vv tests/
qa_bindings_linux_2025-12-05+214218_tests_log.txt:Fatal Python error: Segmentation fault
qa_bindings_linux_2025-12-05+214218_tests_log.txt:../ctk-next/qa/13.1.0/qa_bindings_linux_tests.sh: line 61: 48514 Segmentation fault (core dumped) CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1 python -m pytest -ra -s -vv tests/
qa_bindings_linux_2025-12-06+224850_tests_log.txt:Fatal Python error: Segmentation fault
qa_bindings_linux_2025-12-06+224850_tests_log.txt:../ctk-next/qa/13.1.0/qa_bindings_linux_tests.sh: line 60: 84340 Segmentation fault (core dumped) python -m pytest -ra -s -vv tests/
I'm attaching one of the log files. Please see there for details.
qa_bindings_linux_2025-12-06+224850_tests_log.txt
How to Reproduce
See commands in attached log file. Essentially:
cd cuda_bindings/
pip install ...
pytest -ra -s -v tests/
@sourabgupta3
@rwgk could you check if https://github.com/NVIDIA/cuda-python/pull/1468 would fix it?
I deleted the comment I posted a few minutes ago, I'll have to try again :-(
Sorry I forgot to check for the silent downgrading before, and it bit again:
smc120-0009.ipp2a2.colossus.nvidia.com:/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/test_cufile_multi_v13.1_13de2c20 $ grep 'Successfully installed' ../
cuda-python_qa_bindings_linux_2026-01-23+150622_build_log.txt
Successfully installed pip-25.3
Successfully installed packaging-26.0 setuptools-80.10.1 setuptools_scm-9.2.2 wheel-0.46.3
Successfully installed cuda-pathfinder-1.3.4.dev109+g13de2c20b iniconfig-2.3.0 packaging-26.0 pluggy-1.6.0 pygments-2.19.2 pytest-9.0.2
Successfully installed cython-3.2.4 packaging-26.0 pyclibrary-0.3.0 pyparsing-3.3.2 setuptools-80.10.1 setuptools_scm-9.2.2
Successfully installed cuda-bindings-13.1.2.dev95+g13de2c20b cython-3.2.4 numpy-2.4.1 py-cpuinfo-9.0.0 pyglet-2.1.12 pytest-benchmark-5.2.3 setuptools-80.10.1
Successfully installed Cython-3.2.4 packaging-26.0 setuptools-80.10.1 setuptools-scm-9.2.2
Successfully installed cuda-bindings-13.1.1 cuda-pathfinder-1.3.3
Successfully installed cuda-core-0.5.1.dev62+g13de2c20b pytest-randomly-4.0.1
Oh! The logs and summary I posted before were actually correct. I didn't realize that's a side-effect of the build isolation. TIL
(I'll repost the logs asap)
Explanation
The "Successfully installed cuda-bindings-13.1.1" message is from a temporary build environment, not your main virtual environment.
1. First installation (line 2185): cuda-bindings-13.1.2.dev95+g13de2c20b is installed in TestVenv.
2. Building cuda-core (lines 2244-2258): When installing cuda-core in editable mode, pip creates a temporary build environment
(/tmp/rgrossekunst-tmp/pip-build-env-yu8kbrk0/overlay/) to install backend dependencies needed to build the package.
3. Backend dependency installation (lines 2247-2257): In that build environment, pip installs cuda-bindings==13.* (from cuda-core's pyproject.toml), which resolves to 13.1.1
from PyPI. The "Successfully installed" message refers to this temporary environment.
4. Final state: After the build, the main TestVenv still has cuda-bindings-13.1.2.dev95+g13de2c20b installed, which is why pip list shows that version.
This is pip's build isolation: backend dependencies are installed in a temporary environment for building, and those messages can be misleading because they refer to the build
environment, not your main environment. The main environment is unaffected by those installations.
@sourabgupta3 for awareness — Note: the below is for CTK 13.1.1 (cuda_13.1.1_590.48.01_linux.run)
Reposting after convincing myself that the build worked as expected:
could you check if https://github.com/NVIDIA/cuda-python/pull/1468 would fix it?
It seems to be better, but there is still >10% flakiness.
Full logs and additional files with many details are here (internal access only):
/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/test_cufile_multi_v13.1_13de2c20
The matching full build log:
/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-01-23+150622_build_log.txt
Here is a high-level summary based on the full log files:
================================================================================
QA Test Logs Analysis Summary
================================================================================
Total files analyzed: 200
Files with no flakes (all passed): 179
Files with crashes: 21
================================================================================
Error Details
================================================================================
Files with crashes (21):
- trial17_norm_log_2026-01-23+153614.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial1_ptds_log_2026-01-23+152315.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial20_norm_log_2026-01-23+153849.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial23_norm_log_2026-01-23+154116.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial25_ptds_log_2026-01-23+154315.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial2_ptds_log_2026-01-23+152404.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial32_ptds_log_2026-01-23+154932.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial45_ptds_log_2026-01-23+160048.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial47_ptds_log_2026-01-23+160227.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_buf_register_already_registered
Error messages:
invalid directIO size (KB) specified: 0 min: 64 max: 16384
invalid directIO size (KB) specified: 0 min: 1 max: 256
invalid poll threshold size (KB) specified: 0 min: 4 max: 18014398509481980
invalid io timeout specified, (ms) 0 min: 1 max: 1000
invalid directIO size (KB) specified: 0 min: 1 max: 256
Crash indicator: Fatal Python error: Floating point exception
- trial4_norm_log_2026-01-23+152516.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_cufile_read_write_host_memory
Error messages:
invalid directIO size (KB) specified: 0 min: 64 max: 16384
invalid directIO size (KB) specified: 0 min: 1 max: 256
invalid poll threshold size (KB) specified: 0 min: 4 max: 18014398509481980
invalid io timeout specified, (ms) 0 min: 1 max: 1000
invalid directIO size (KB) specified: 0 min: 1 max: 256
Crash indicator: Fatal Python error: Floating point exception
- trial50_ptds_log_2026-01-23+160440.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial54_norm_log_2026-01-23+160740.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial61_norm_log_2026-01-23+161345.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial67_norm_log_2026-01-23+161848.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial68_ptds_log_2026-01-23+161951.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial73_norm_log_2026-01-23+162328.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial7_norm_log_2026-01-23+152733.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial82_ptds_log_2026-01-23+163143.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial91_norm_log_2026-01-23+163852.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
- trial92_ptds_log_2026-01-23+164004.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_stats_start_stop
Error messages:
invalid directIO size (KB) specified: 0 min: 64 max: 16384
invalid directIO size (KB) specified: 0 min: 1 max: 256
invalid poll threshold size (KB) specified: 0 min: 4 max: 18014398509481980
invalid io timeout specified, (ms) 0 min: 1 max: 1000
invalid directIO size (KB) specified: 0 min: 1 max: 256
Crash indicator: Fatal Python error: Floating point exception
- trial95_ptds_log_2026-01-23+164220.txt
Number of crashes: 1
Crash at line 1:
rootdir: /wrk/forked/cuda-python/cuda_bindings
Test session start: ============================= test session starts ==============================
Likely failing test: tests/test_cufile.py::test_get_stats_l3
Crash indicator: Fatal Python error: Segmentation fault
================================================================================
Overall Statistics
================================================================================
Total tests passed (across all files): 5006
================================================================================
ERROR Summary
================================================================================
1 ERROR tests/test_cufile.py::test_batch_io_cancel - cuda.bindings.cufile.cuFil...
1 ERROR tests/test_cufile.py::test_batch_io_large_operations - cuda.bindings.cu...
1 ERROR tests/test_cufile.py::test_buf_register_multiple_buffers - cuda.binding...
1 ERROR tests/test_cufile.py::test_get_parameter_min_max_value - cuda.bindings....
1 ERROR tests/test_cufile.py::test_get_stats_l3 - cuda.bindings.cufile.cuFileEr...
1 ERROR tests/test_cufile.py::test_handle_register - cuda.bindings.cufile.cuFil...
================================================================================
Error Type Summary
================================================================================
Crashes: 21 files
- trial17_norm_log_2026-01-23+153614.txt
- trial1_ptds_log_2026-01-23+152315.txt
- trial20_norm_log_2026-01-23+153849.txt
- trial23_norm_log_2026-01-23+154116.txt
- trial25_ptds_log_2026-01-23+154315.txt
- trial2_ptds_log_2026-01-23+152404.txt
- trial32_ptds_log_2026-01-23+154932.txt
- trial45_ptds_log_2026-01-23+160048.txt
- trial47_ptds_log_2026-01-23+160227.txt
- trial4_norm_log_2026-01-23+152516.txt
- trial50_ptds_log_2026-01-23+160440.txt
- trial54_norm_log_2026-01-23+160740.txt
- trial61_norm_log_2026-01-23+161345.txt
- trial67_norm_log_2026-01-23+161848.txt
- trial68_ptds_log_2026-01-23+161951.txt
- trial73_norm_log_2026-01-23+162328.txt
- trial7_norm_log_2026-01-23+152733.txt
- trial82_ptds_log_2026-01-23+163143.txt
- trial91_norm_log_2026-01-23+163852.txt
- trial92_ptds_log_2026-01-23+164004.txt
- trial95_ptds_log_2026-01-23+164220.txt
================================================================================
Counts of "Likely failing test"
================================================================================
18 tests/test_cufile.py::test_get_stats_l3
1 tests/test_cufile.py::test_buf_register_already_registered
1 tests/test_cufile.py::test_cufile_read_write_host_memory
1 tests/test_cufile.py::test_stats_start_stop