cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

`test_vmm_allocator_policy_configuration` failure: Windows / A6000 / WDDM

Open rwgk opened this issue 3 months ago • 8 comments

Tracking the failure below.

xref: https://github.com/NVIDIA/cuda-python/pull/1242#issuecomment-3545628920

All details are in the full logs:

qa_bindings_windows_2025-11-18+102913_build_log.txt

qa_bindings_windows_2025-11-18+102913_tests_log.txt

The only non-obvious detail:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0 was installed from cuda_13.0.1_windows.exe

EDIT: The exact same error appeared when retesting with v13.0 installed from cuda_13.0.2_windows.exe

C:\Users\rgrossekunst\forked\cuda-python>nvidia-smi
Tue Nov 18 10:31:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.34                 Driver Version: 591.34         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000             WDDM  |   00000000:C1:00.0 Off |                  Off |
| 30%   31C    P8             19W /  300W |    1778MiB /  49140MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
================================== FAILURES ===================================
___________________ test_vmm_allocator_policy_configuration ___________________

    def test_vmm_allocator_policy_configuration():
        """Test VMM allocator with different policy configurations.
    
        This test verifies that VirtualMemoryResource can be configured
        with different allocation policies and that the configuration affects
        the allocation behavior.
        """
        device = Device()
        device.set_current()
    
        # Skip if virtual memory management is not supported
        if not device.properties.virtual_memory_management_supported:
            pytest.skip("Virtual memory management is not supported on this device")
    
        # Skip if GPU Direct RDMA is supported (we want to test the unsupported case)
        if not device.properties.gpu_direct_rdma_supported:
            pytest.skip("This test requires a device that doesn't support GPU Direct RDMA")
    
        # Test with custom VMM config
        custom_config = VirtualMemoryResourceOptions(
            allocation_type="pinned",
            location_type="device",
            granularity="minimum",
            gpu_direct_rdma=True,
            handle_type="posix_fd" if not IS_WINDOWS else "win32_kmt",
            peers=(),
            self_access="rw",
            peer_access="rw",
        )
    
        vmm_mr = VirtualMemoryResource(device, config=custom_config)
    
        # Verify configuration is applied
        assert vmm_mr.config == custom_config
        assert vmm_mr.config.gpu_direct_rdma is True
        assert vmm_mr.config.granularity == "minimum"
    
        # Test allocation with custom config
        buffer = vmm_mr.allocate(8192)
        assert buffer.size >= 8192
        assert buffer.device_id == device.device_id
    
        # Test policy modification
        new_config = VirtualMemoryResourceOptions(
            allocation_type="pinned",
            location_type="device",
            granularity="recommended",
            gpu_direct_rdma=False,
            handle_type="posix_fd" if not IS_WINDOWS else "win32_kmt",
            peers=(),
            self_access="r",  # Read-only access
            peer_access="r",
        )
    
        # Modify allocation policy
>       modified_buffer = vmm_mr.modify_allocation(buffer, 16384, config=new_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests\test_memory.py:440: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cuda\core\experimental\_memory\_virtual_memory_resource.py:230: in modify_allocation
    raise_if_driver_error(res)
cuda\core\experimental\_utils\cuda_utils.pyx:67: in cuda.core.experimental._utils.cuda_utils._check_driver_error
    cpdef inline int _check_driver_error(cydriver.CUresult error) except?-1 nogil:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise CUDAError(f"{name.decode()}: {expl}")
E   cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_UNKNOWN: This indicates that an unknown internal error has occurred.

cuda\core\experimental\_utils\cuda_utils.pyx:78: CUDAError
=========================== short test summary info ===========================
SKIPPED [6] tests\example_tests\utils.py:37: cupy not installed, skipping related tests
SKIPPED [1] tests\example_tests\utils.py:37: torch not installed, skipping related tests
SKIPPED [1] tests\example_tests\utils.py:43: skip C:\Users\rgrossekunst\forked\cuda-python\cuda_core\tests\example_tests\..\..\examples\thread_block_cluster.py
SKIPPED [5] tests\memory_ipc\test_errors.py:20: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_event_ipc.py:20: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_event_ipc.py:91: Device does not support IPC
SKIPPED [2] tests\memory_ipc\test_event_ipc.py:106: Device does not support IPC
SKIPPED [8] tests\memory_ipc\test_event_ipc.py:123: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_leaks.py:26: mempool allocation handle is not using fds or psutil is unavailable
SKIPPED [12] tests\memory_ipc\test_leaks.py:82: mempool allocation handle is not using fds or psutil is unavailable
SKIPPED [1] tests\memory_ipc\test_memory_ipc.py:16: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_memory_ipc.py:53: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_memory_ipc.py:103: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_memory_ipc.py:153: Device does not support IPC
SKIPPED [2] tests\memory_ipc\test_send_buffers.py:18: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_serialize.py:24: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_serialize.py:79: Device does not support IPC
SKIPPED [1] tests\memory_ipc\test_serialize.py:125: Device does not support IPC
SKIPPED [2] tests\memory_ipc\test_workerpool.py:29: Device does not support IPC
SKIPPED [2] tests\memory_ipc\test_workerpool.py:65: Device does not support IPC
SKIPPED [2] tests\memory_ipc\test_workerpool.py:109: Device does not support IPC
SKIPPED [1] tests\test_device.py:327: Test requires at least 2 CUDA devices
SKIPPED [1] tests\test_device.py:375: Test requires at least 2 CUDA devices
SKIPPED [1] tests\test_launcher.py:92: Driver or GPU not new enough for thread block clusters
SKIPPED [1] tests\test_launcher.py:122: Driver or GPU not new enough for thread block clusters
SKIPPED [2] tests\test_launcher.py:274: cupy not installed
SKIPPED [1] tests\test_linker.py:113: nvjitlink requires lto for ptx linking
SKIPPED [1] tests\test_memory.py:514: This test requires a device that doesn't support GPU Direct RDMA
SKIPPED [1] tests\test_memory.py:645: Driver rejects IPC-enabled mempool creation on this platform
SKIPPED [7] tests\test_module.py:345: Test requires numba to be installed
SKIPPED [2] tests\test_module.py:389: Device with compute capability 90 or higher is required for cluster support
SKIPPED [1] tests\test_module.py:404: Device with compute capability 90 or higher is required for cluster support
SKIPPED [2] tests\test_utils.py: got empty parameter set for (in_arr, use_stream)
SKIPPED [1] tests\test_utils.py: CuPy is not installed
FAILED tests/test_memory.py::test_vmm_allocator_policy_configuration - cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_UNKNOWN: This indicates that an unknown internal error has occurred.
============ 1 failed, 518 passed, 75 skipped in 68.77s (0:01:08) =============

rwgk avatar Nov 18 '25 18:11 rwgk

Double-checking to be sure: The test also fails when run in isolation:

(TestVenv) PS C:\Users\rgrossekunst\forked\cuda-python\cuda_core> pytest -ra -s -v tests/test_memory.py -k test_vmm_allocator_policy_configuration
================================================================================== test session starts ==================================================================================
platform win32 -- Python 3.13.9, pytest-9.0.1, pluggy-1.6.0 -- C:\Users\rgrossekunst\forked\cuda-python\TestVenv\Scripts\python.exe
cachedir: .pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: C:\Users\rgrossekunst\forked\cuda-python\cuda_core
configfile: pytest.ini
plugins: benchmark-5.2.3
collected 41 items / 40 deselected / 1 selected

tests/test_memory.py::test_vmm_allocator_policy_configuration FAILED

======================================================================================= FAILURES ========================================================================================
________________________________________________________________________ test_vmm_allocator_policy_configuration ________________________________________________________________________

    def test_vmm_allocator_policy_configuration():
        """Test VMM allocator with different policy configurations.

        This test verifies that VirtualMemoryResource can be configured
        with different allocation policies and that the configuration affects
        the allocation behavior.
        """
        device = Device()
        device.set_current()

        # Skip if virtual memory management is not supported
        if not device.properties.virtual_memory_management_supported:
            pytest.skip("Virtual memory management is not supported on this device")

        # Skip if GPU Direct RDMA is supported (we want to test the unsupported case)
        if not device.properties.gpu_direct_rdma_supported:
            pytest.skip("This test requires a device that doesn't support GPU Direct RDMA")

        # Test with custom VMM config
        custom_config = VirtualMemoryResourceOptions(
            allocation_type="pinned",
            location_type="device",
            granularity="minimum",
            gpu_direct_rdma=True,
            handle_type="posix_fd" if not IS_WINDOWS else "win32_kmt",
            peers=(),
            self_access="rw",
            peer_access="rw",
        )

        vmm_mr = VirtualMemoryResource(device, config=custom_config)

        # Verify configuration is applied
        assert vmm_mr.config == custom_config
        assert vmm_mr.config.gpu_direct_rdma is True
        assert vmm_mr.config.granularity == "minimum"

        # Test allocation with custom config
        buffer = vmm_mr.allocate(8192)
        assert buffer.size >= 8192
        assert buffer.device_id == device.device_id

        # Test policy modification
        new_config = VirtualMemoryResourceOptions(
            allocation_type="pinned",
            location_type="device",
            granularity="recommended",
            gpu_direct_rdma=False,
            handle_type="posix_fd" if not IS_WINDOWS else "win32_kmt",
            peers=(),
            self_access="r",  # Read-only access
            peer_access="r",
        )

        # Modify allocation policy
>       modified_buffer = vmm_mr.modify_allocation(buffer, 16384, config=new_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests\test_memory.py:440:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cuda\core\experimental\_memory\_virtual_memory_resource.py:230: in modify_allocation
    raise_if_driver_error(res)
cuda\core\experimental\_utils\cuda_utils.pyx:67: in cuda.core.experimental._utils.cuda_utils._check_driver_error
    cpdef inline int _check_driver_error(cydriver.CUresult error) except?-1 nogil:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise CUDAError(f"{name.decode()}: {expl}")
E   cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_UNKNOWN: This indicates that an unknown internal error has occurred.

cuda\core\experimental\_utils\cuda_utils.pyx:78: CUDAError
================================================================================ short test summary info ================================================================================
FAILED tests/test_memory.py::test_vmm_allocator_policy_configuration - cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_UNKNOWN: This indicates that an unknown internal error has occurred.
=========================================================================== 1 failed, 40 deselected in 0.27s ============================================================================

rwgk avatar Nov 18 '25 18:11 rwgk

@leofang Same problem after switching to the 591.32 driver, everything else unchanged (I'm still using the previous build):

(TestVenv) PS C:\Users\rgrossekunst\forked\cuda-python> nvidia-smi
Tue Nov 18 14:22:04 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.32                 Driver Version: 591.32         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000             WDDM  |   00000000:C1:00.0 Off |                  Off |
| 30%   32C    P8             18W /  300W |    1225MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           16280    C+G   ...8bbwe\Microsoft.CmdPal.UI.exe      N/A      |
|    0   N/A  N/A           18072    C+G   ...y\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A           18112    C+G   ..._cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A           20708    C+G   ...xyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A           20808    C+G   ...UI3Apps\PowerToys.Peek.UI.exe      N/A      |
|    0   N/A  N/A           22460    C+G   ...indows\System32\ShellHost.exe      N/A      |
|    0   N/A  N/A           22820    C+G   ...crosoft\OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A           23076    C+G   ....0.3595.80\msedgewebview2.exe      N/A      |
|    0   N/A  N/A           23240    C+G   ...Local\PowerToys\PowerToys.exe      N/A      |
|    0   N/A  N/A           23608    C+G   ...s\PowerToys.AdvancedPaste.exe      N/A      |
|    0   N/A  N/A           23880    C+G   ...Toys\PowerToys.FancyZones.exe      N/A      |
|    0   N/A  N/A           27088    C+G   ...8bbwe\PhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A           27380    C+G   ...yb3d8bbwe\WindowsTerminal.exe      N/A      |
|    0   N/A  N/A           28616    C+G   C:\Windows\explorer.exe               N/A      |
|    0   N/A  N/A           29196    C+G   ....0.3595.80\msedgewebview2.exe      N/A      |
|    0   N/A  N/A           32260    C+G   ....0.3595.80\msedgewebview2.exe      N/A      |
|    0   N/A  N/A           32996    C+G   ...2txyewy\CrossDeviceResume.exe      N/A      |
|    0   N/A  N/A           34964    C+G   ...5n1h2txyewy\TextInputHost.exe      N/A      |
+-----------------------------------------------------------------------------------------+
(TestVenv) PS C:\Users\rgrossekunst\forked\cuda-python> pytest -ra -s -v .\cuda_core\tests\test_memory.py
================================================================================= test session starts =================================================================================
platform win32 -- Python 3.13.9, pytest-9.0.1, pluggy-1.6.0 -- C:\Users\rgrossekunst\forked\cuda-python\TestVenv\Scripts\python.exe
cachedir: .pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: C:\Users\rgrossekunst\forked\cuda-python\cuda_core
configfile: pytest.ini
plugins: benchmark-5.2.3
collected 41 items

cuda_core\tests\test_memory.py::test_package_contents PASSED
cuda_core\tests\test_memory.py::test_buffer_initialization PASSED
cuda_core\tests\test_memory.py::test_buffer_copy_to PASSED
cuda_core\tests\test_memory.py::test_buffer_copy_from PASSED
cuda_core\tests\test_memory.py::test_buffer_close PASSED
cuda_core\tests\test_memory.py::test_buffer_dunder_dlpack PASSED
cuda_core\tests\test_memory.py::test_buffer_dunder_dlpack_device_success[DummyDeviceMemoryResource-expected0] PASSED
cuda_core\tests\test_memory.py::test_buffer_dunder_dlpack_device_success[DummyHostMemoryResource-expected1] PASSED
cuda_core\tests\test_memory.py::test_buffer_dunder_dlpack_device_success[DummyUnifiedMemoryResource-expected2] PASSED
cuda_core\tests\test_memory.py::test_buffer_dunder_dlpack_device_success[DummyPinnedMemoryResource-expected3] PASSED
cuda_core\tests\test_memory.py::test_buffer_dunder_dlpack_device_failure PASSED
cuda_core\tests\test_memory.py::test_device_memory_resource_initialization[True] PASSED
cuda_core\tests\test_memory.py::test_device_memory_resource_initialization[False] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_basic_allocation[handle_type0-True] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_basic_allocation[handle_type0-False] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_basic_allocation[handle_type1-True] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_basic_allocation[handle_type1-False] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_policy_configuration FAILED
cuda_core\tests\test_memory.py::test_vmm_allocator_grow_allocation[handle_type0] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_grow_allocation[handle_type1] PASSED
cuda_core\tests\test_memory.py::test_vmm_allocator_rdma_unsupported_exception SKIPPED (This test requires a device that doesn't support GPU Direct RDMA)
cuda_core\tests\test_memory.py::test_mempool PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reuse_follow_event_dependencies-bool-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reuse_follow_event_dependencies-bool-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reuse_allow_opportunistic-bool-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reuse_allow_opportunistic-bool-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reuse_allow_internal_dependencies-bool-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reuse_allow_internal_dependencies-bool-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[release_threshold-int-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[release_threshold-int-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reserved_mem_current-int-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reserved_mem_current-int-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reserved_mem_high-int-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[reserved_mem_high-int-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[used_mem_current-int-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[used_mem_current-int-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[used_mem_high-int-True] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes[used_mem_high-int-False] PASSED
cuda_core\tests\test_memory.py::test_mempool_attributes_ownership SKIPPED (Driver rejects IPC-enabled mempool creation on this platform)
cuda_core\tests\test_memory.py::test_strided_memory_view_leak PASSED
cuda_core\tests\test_memory.py::test_strided_memory_view_refcnt PASSED

====================================================================================== FAILURES =======================================================================================
_______________________________________________________________________ test_vmm_allocator_policy_configuration _______________________________________________________________________

    def test_vmm_allocator_policy_configuration():
        """Test VMM allocator with different policy configurations.

        This test verifies that VirtualMemoryResource can be configured
        with different allocation policies and that the configuration affects
        the allocation behavior.
        """
        device = Device()
        device.set_current()

        # Skip if virtual memory management is not supported
        if not device.properties.virtual_memory_management_supported:
            pytest.skip("Virtual memory management is not supported on this device")

        # Skip if GPU Direct RDMA is supported (we want to test the unsupported case)
        if not device.properties.gpu_direct_rdma_supported:
            pytest.skip("This test requires a device that doesn't support GPU Direct RDMA")

        # Test with custom VMM config
        custom_config = VirtualMemoryResourceOptions(
            allocation_type="pinned",
            location_type="device",
            granularity="minimum",
            gpu_direct_rdma=True,
            handle_type="posix_fd" if not IS_WINDOWS else "win32_kmt",
            peers=(),
            self_access="rw",
            peer_access="rw",
        )

        vmm_mr = VirtualMemoryResource(device, config=custom_config)

        # Verify configuration is applied
        assert vmm_mr.config == custom_config
        assert vmm_mr.config.gpu_direct_rdma is True
        assert vmm_mr.config.granularity == "minimum"

        # Test allocation with custom config
        buffer = vmm_mr.allocate(8192)
        assert buffer.size >= 8192
        assert buffer.device_id == device.device_id

        # Test policy modification
        new_config = VirtualMemoryResourceOptions(
            allocation_type="pinned",
            location_type="device",
            granularity="recommended",
            gpu_direct_rdma=False,
            handle_type="posix_fd" if not IS_WINDOWS else "win32_kmt",
            peers=(),
            self_access="r",  # Read-only access
            peer_access="r",
        )

        # Modify allocation policy
>       modified_buffer = vmm_mr.modify_allocation(buffer, 16384, config=new_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

cuda_core\tests\test_memory.py:440:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cuda_core\cuda\core\experimental\_memory\_virtual_memory_resource.py:230: in modify_allocation
    raise_if_driver_error(res)
cuda/core/experimental/_utils/cuda_utils.pyx:67: in cuda.core.experimental._utils.cuda_utils._check_driver_error
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_UNKNOWN: This indicates that an unknown internal error has occurred.

cuda/core/experimental/_utils/cuda_utils.pyx:78: CUDAError
=============================================================================== short test summary info ===============================================================================
SKIPPED [1] cuda_core\tests\test_memory.py:514: This test requires a device that doesn't support GPU Direct RDMA
SKIPPED [1] cuda_core\tests\test_memory.py:645: Driver rejects IPC-enabled mempool creation on this platform
FAILED cuda_core\tests\test_memory.py::test_vmm_allocator_policy_configuration - cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_UNKNOWN: This indicates that an unknown internal error has occurred.
======================================================================= 1 failed, 38 passed, 2 skipped in 0.35s =======================================================================

rwgk avatar Nov 18 '25 22:11 rwgk

CUDA_ERROR_UNKNOWN seems to be new.

leofang avatar Nov 18 '25 23:11 leofang

The linked comment had CUDA_ERROR_INVALID_VALUE.

leofang avatar Nov 18 '25 23:11 leofang

The linked comment had CUDA_ERROR_INVALID_VALUE.

I overlooked that. Weird: because in all errors pasted under this issue, it's CUDA_ERROR_UNKNOWN.

But I have something that's probably more interesting. Will post separately (next comment).

rwgk avatar Nov 18 '25 23:11 rwgk

@leofang This is now with the 581.80 driver, everything else unchanged (I'm still using the previous build):

Highlighting here (copied from the full output below):

tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED (This test requires a device that doesn't support GPU Direct RDMA)

What could explain that? I.e. does the 581 driver support RDMA, but the 591 driver does not, for the exact same hardware?


(TestVenv) PS C:\Users\rgrossekunst\forked\cuda-python\cuda_core> nvidia-smi
Tue Nov 18 15:15:57 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80                 Driver Version: 581.80         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000             WDDM  |   00000000:C1:00.0 Off |                  Off |
| 30%   34C    P8             19W /  300W |    1120MiB /  49140MiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           12532    C+G   ....0.3595.80\msedgewebview2.exe      N/A      |
|    0   N/A  N/A           15836    C+G   ....0.3595.80\msedgewebview2.exe      N/A      |
|    0   N/A  N/A           18252    C+G   C:\Windows\explorer.exe               N/A      |
|    0   N/A  N/A           18376    C+G   ...indows\System32\ShellHost.exe      N/A      |
|    0   N/A  N/A           19172    C+G   ...2txyewy\CrossDeviceResume.exe      N/A      |
|    0   N/A  N/A           21228    C+G   ..._cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A           21236    C+G   ...y\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A           23888    C+G   ....0.3595.80\msedgewebview2.exe      N/A      |
|    0   N/A  N/A           25400    C+G   ...Local\PowerToys\PowerToys.exe      N/A      |
|    0   N/A  N/A           25568    C+G   ...8bbwe\PhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A           25588    C+G   ...xyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A           26216    C+G   ...Toys\PowerToys.FancyZones.exe      N/A      |
|    0   N/A  N/A           26548    C+G   ...8bbwe\Microsoft.CmdPal.UI.exe      N/A      |
|    0   N/A  N/A           27704    C+G   ...UI3Apps\PowerToys.Peek.UI.exe      N/A      |
|    0   N/A  N/A           29224    C+G   ...5n1h2txyewy\TextInputHost.exe      N/A      |
|    0   N/A  N/A           33324    C+G   ...crosoft\OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A           34224    C+G   ...yb3d8bbwe\WindowsTerminal.exe      N/A      |
+-----------------------------------------------------------------------------------------+
(TestVenv) PS C:\Users\rgrossekunst\forked\cuda-python\cuda_core> pytest -ra -s -v .\tests\test_memory.py
========================================================================= test session starts ==========================================================================
platform win32 -- Python 3.13.9, pytest-9.0.1, pluggy-1.6.0 -- C:\Users\rgrossekunst\forked\cuda-python\TestVenv\Scripts\python.exe
cachedir: .pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: C:\Users\rgrossekunst\forked\cuda-python\cuda_core
configfile: pytest.ini
plugins: benchmark-5.2.3
collected 41 items

tests/test_memory.py::test_package_contents PASSED
tests/test_memory.py::test_buffer_initialization PASSED
tests/test_memory.py::test_buffer_copy_to PASSED
tests/test_memory.py::test_buffer_copy_from PASSED
tests/test_memory.py::test_buffer_close PASSED
tests/test_memory.py::test_buffer_dunder_dlpack PASSED
tests/test_memory.py::test_buffer_dunder_dlpack_device_success[DummyDeviceMemoryResource-expected0] PASSED
tests/test_memory.py::test_buffer_dunder_dlpack_device_success[DummyHostMemoryResource-expected1] PASSED
tests/test_memory.py::test_buffer_dunder_dlpack_device_success[DummyUnifiedMemoryResource-expected2] PASSED
tests/test_memory.py::test_buffer_dunder_dlpack_device_success[DummyPinnedMemoryResource-expected3] PASSED
tests/test_memory.py::test_buffer_dunder_dlpack_device_failure PASSED
tests/test_memory.py::test_device_memory_resource_initialization[True] PASSED
tests/test_memory.py::test_device_memory_resource_initialization[False] PASSED
tests/test_memory.py::test_vmm_allocator_basic_allocation[handle_type0-True] PASSED
tests/test_memory.py::test_vmm_allocator_basic_allocation[handle_type0-False] PASSED
tests/test_memory.py::test_vmm_allocator_basic_allocation[handle_type1-True] PASSED
tests/test_memory.py::test_vmm_allocator_basic_allocation[handle_type1-False] PASSED
tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED (This test requires a device that doesn't support GPU Direct RDMA)
tests/test_memory.py::test_vmm_allocator_grow_allocation[handle_type0] PASSED
tests/test_memory.py::test_vmm_allocator_grow_allocation[handle_type1] PASSED
tests/test_memory.py::test_vmm_allocator_rdma_unsupported_exception PASSED
tests/test_memory.py::test_mempool PASSED
tests/test_memory.py::test_mempool_attributes[reuse_follow_event_dependencies-bool-True] PASSED
tests/test_memory.py::test_mempool_attributes[reuse_follow_event_dependencies-bool-False] PASSED
tests/test_memory.py::test_mempool_attributes[reuse_allow_opportunistic-bool-True] PASSED
tests/test_memory.py::test_mempool_attributes[reuse_allow_opportunistic-bool-False] PASSED
tests/test_memory.py::test_mempool_attributes[reuse_allow_internal_dependencies-bool-True] PASSED
tests/test_memory.py::test_mempool_attributes[reuse_allow_internal_dependencies-bool-False] PASSED
tests/test_memory.py::test_mempool_attributes[release_threshold-int-True] PASSED
tests/test_memory.py::test_mempool_attributes[release_threshold-int-False] PASSED
tests/test_memory.py::test_mempool_attributes[reserved_mem_current-int-True] PASSED
tests/test_memory.py::test_mempool_attributes[reserved_mem_current-int-False] PASSED
tests/test_memory.py::test_mempool_attributes[reserved_mem_high-int-True] PASSED
tests/test_memory.py::test_mempool_attributes[reserved_mem_high-int-False] PASSED
tests/test_memory.py::test_mempool_attributes[used_mem_current-int-True] PASSED
tests/test_memory.py::test_mempool_attributes[used_mem_current-int-False] PASSED
tests/test_memory.py::test_mempool_attributes[used_mem_high-int-True] PASSED
tests/test_memory.py::test_mempool_attributes[used_mem_high-int-False] PASSED
tests/test_memory.py::test_mempool_attributes_ownership SKIPPED (Driver rejects IPC-enabled mempool creation on this platform)
tests/test_memory.py::test_strided_memory_view_leak PASSED
tests/test_memory.py::test_strided_memory_view_refcnt PASSED

======================================================================= short test summary info ========================================================================
SKIPPED [1] tests\test_memory.py:401: This test requires a device that doesn't support GPU Direct RDMA
SKIPPED [1] tests\test_memory.py:645: Driver rejects IPC-enabled mempool creation on this platform
==================================================================== 39 passed, 2 skipped in 0.26s =====================================================================

rwgk avatar Nov 18 '25 23:11 rwgk

What could explain that? I.e. does the 581 driver support RDMA, but the 591 driver does not, for the exact same hardware?

Yes most likely it's the case. Please bring this to @ksimpson-work's attention. It might be possible that this is a real bug coming from the latest driver.

leofang avatar Nov 19 '25 00:11 leofang

@leofang @ksimpson-work @rparolin

I turns out, most likely the situation is more straightforward and there is no bug in the newer driver:

  • The skip message is most likely a copy-paste mishap: https://github.com/NVIDIA/cuda-python/pull/1266

  • I see on my A6000 machine that the 581.80 Windows driver does NOT support RDMA, while very interestingly the WLS2 Ubuntu 24.04 pass-through driver does. — I validated that observation with a minimal C program:

    CHECK_CUDA(cuDeviceGetAttribute(
        &rdma_supported,
        CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_SUPPORTED,
        dev));

Apparently, the newer Windows driver does support RDMA, therefore the new test is not skipped, and we're getting into a code path that was never exercised on Windows before (I inspected our CI logs, see below).

The upshot is that we still need to debug why the test is failing here:

>       modified_buffer = vmm_mr.modify_allocation(buffer, 16384, config=new_config)

rwgk-win11.localdomain:~/logs_19475019392 $ grep 'test_vmm_allocator_policy_configuration' *Test*.txt | grep -v SKIPPED
14_Test linux-aarch64 _ py3.11, 12.9.1, wheels, a100.txt:2025-11-18T17:33:10.2323694Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 61%]
15_Test linux-aarch64 _ py3.14t, 13.0.2, local, a100.txt:2025-11-18T17:34:34.6747623Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
16_Test linux-aarch64 _ py3.13, 12.9.1, wheels, a100.txt:2025-11-18T17:38:35.1147682Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 61%]
17_Test linux-aarch64 _ py3.12, 13.0.2, wheels, a100.txt:2025-11-18T17:38:24.7111273Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
18_Test linux-aarch64 _ py3.11, 13.0.2, local, a100.txt:2025-11-18T17:38:43.5229037Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
19_Test linux-aarch64 _ py3.12, 12.9.1, local, a100.txt:2025-11-18T17:41:47.1509235Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
20_Test linux-aarch64 _ py3.10, 12.9.1, local, a100.txt:2025-11-18T17:42:21.0063612Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
21_Test linux-aarch64 _ py3.14, 13.0.2, local, a100.txt:2025-11-18T17:40:29.0920319Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
22_Test linux-aarch64 _ py3.13, 13.0.2, local, a100.txt:2025-11-18T17:34:38.5940371Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
23_Test linux-aarch64 _ py3.10, 13.0.2, wheels, a100.txt:2025-11-18T17:33:29.1723080Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
25_Test linux-64 _ py3.11, 12.9.1, wheels, rtxpro6000.txt:2025-11-18T17:36:58.2401611Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 61%]
26_Test linux-64 _ py3.10, 12.9.1, local, v100.txt:2025-11-18T17:33:42.5628444Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
27_Test linux-64 _ py3.13, 13.0.2, local, H100.txt:2025-11-18T17:34:11.6529205Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
29_Test linux-64 _ py3.14t, 13.0.2, local, l4.txt:2025-11-18T17:33:45.6132779Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
2_Test linux-64 _ py3.11, 13.0.2, local, l4.txt:2025-11-18T18:58:49.8588599Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
30_Test linux-64 _ py3.12, 12.9.1, local, l4.txt:2025-11-18T17:38:10.4892074Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
31_Test linux-64 _ py3.13, 12.9.1, wheels, v100.txt:2025-11-18T17:32:08.0532110Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 61%]
32_Test linux-64 _ py3.13, 13.0.2, local, rtxpro6000.txt:2025-11-18T17:38:09.7285341Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
33_Test linux-64 _ py3.12, 13.0.2, wheels, l4.txt:2025-11-18T17:33:12.8154845Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
34_Test linux-64 _ py3.10, 13.0.2, wheels, l4.txt:2025-11-18T17:32:35.0252092Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
35_Test linux-64 _ py3.14, 13.0.2, local, l4.txt:2025-11-18T17:33:32.4391211Z tests/test_memory.py::test_vmm_allocator_policy_configuration PASSED     [ 62%]
rwgk-win11.localdomain:~/logs_19475019392 $ grep 'test_vmm_allocator_policy_configuration' *Test*.txt | grep SKIPPED
10_Test win-64 _ py3.14, 12.9.1, wheels, v100 (TCC).txt:2025-11-18T17:40:36.5720518Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
11_Test win-64 _ py3.10, 12.9.1, wheels, rtx2080 (WDDM).txt:2025-11-18T17:42:10.2909268Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 61%]
12_Test win-64 _ py3.10, 13.0.2, local, rtxpro6000 (TCC).txt:2025-11-18T17:49:40.0507875Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
1_Test win-64 _ py3.12, 12.9.1, wheels, l4 (MCDM).txt:2025-11-18T17:52:44.4211544Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 61%]
1_Test win-64 _ py3.13, 12.9.1, local, l4 (TCC).txt:2025-11-18T19:09:39.2361308Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
2_Test win-64 _ py3.14, 13.0.2, local, l4 (MCDM).txt:2025-11-18T17:49:49.3962422Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
3_Test win-64 _ py3.11, 13.0.2, wheels, rtx4090 (WDDM).txt:2025-11-18T17:58:53.4507192Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
4_Test win-64 _ py3.11, 12.9.1, local, v100 (MCDM).txt:2025-11-18T17:49:10.4474200Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
5_Test win-64 _ py3.14t, 13.0.2, wheels, a100 (MCDM).txt:2025-11-18T17:42:05.2867410Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
7_Test win-64 _ py3.13, 13.0.2, wheels, rtxpro6000 (MCDM).txt:2025-11-18T17:48:18.3916348Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
8_Test win-64 _ py3.12, 13.0.2, local, a100 (TCC).txt:2025-11-18T17:47:09.2144717Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]
9_Test win-64 _ py3.14t, 12.9.1, local, l4 (TCC).txt:2025-11-18T17:59:07.7198765Z tests/test_memory.py::test_vmm_allocator_policy_configuration SKIPPED    [ 62%]

rwgk avatar Nov 19 '25 04:11 rwgk