gdrcopy icon indicating copy to clipboard operation
gdrcopy copied to clipboard

"gdrcopy_sanity" failing with Nvidia 560 drivers on A100

Open rafsalas19 opened this issue 1 year ago • 5 comments

OS: Ubuntu 20.04 Kernel: 5.15.0-1071 Nv drivers: 560.35.03 CUDA: 12.5 gdrcopy: 2.4.1

error:

 gdrcopy_sanity
Assertion "(gdr_pin_buffer(g, d_A[0], buffer_size, 0, 0, &A_mh[0])) == (0)" failed at sanity.cpp:435
Assertion "(gdr_pin_buffer(g, d_A, A_size, 0, 0, &A_mh)) == (0)" failed at sanity.cpp:344
Total: 28, Passed: 24, Failed: 2, Waived: 2

List of failed tests:
    basic_small_buffers_mapping
    basic_unaligned_mapping

List of waived tests:
    invalidation_access_after_free_cumemalloc
    invalidation_access_after_free_vmmalloc
Error: Encountered an error or a test failure with status=1

rafsalas19 avatar Sep 19 '24 16:09 rafsalas19

Hi @rafsalas19,

This looks like an issue we have already fixed in the master branch. May I ask you to try it out? You will need to compile and install the gdrdrv driver to get this fix.

pakmarkthub avatar Sep 20 '24 09:09 pakmarkthub

Ok thanks let me try

rafsalas19 avatar Sep 20 '24 20:09 rafsalas19

Ok this worked Thanks! Can you let me know when will there be a release/tag published that has this fix in it?

rafsalas19 avatar Sep 23 '24 15:09 rafsalas19

@pakmarkthub Following up on this. We only pull releases to include on our systems. When can we expect a release to include this, it appears, 7 month old fix?

This is currently blocking us from adopting new NVIDIA Telsa drivers, and thus new CUDA runtimes as well

LiquidPT avatar Oct 15 '24 23:10 LiquidPT

@LiquidPT FYI we recently tagged rel 2.4.2.

drossetti avatar Nov 18 '24 17:11 drossetti