SparseArrays.jl icon indicating copy to clipboard operation
SparseArrays.jl copied to clipboard

Zero out all values and pointers before passing to CHOLMOD

Open ViralBShah opened this issue 3 years ago • 2 comments

Just like https://github.com/JuliaSparse/SparseArrays.jl/pull/183, we should ensure CHOLMOD wrappers also zero out everything.

Some examples of the cholmod crashes: https://build.julialang.org/#/builders/63/builds/6826/steps/5/logs/stdio

      From worker 3:	Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
      From worker 3:	Exception: EXCEPTION_ACCESS_VIOLATION at 0xe1a03914 -- .text at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 3:	in expression starting at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\test\cholmod.jl:18
      From worker 3:	.text at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 3:	read_triplet at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 3:	cholmod_l_read_triplet at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 3:	cholmod_l_read_sparse at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 3:	cholmod_l_read_sparse at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\src\solvers\lib\x86_64-w64-mingw32.jl:932

https://build.julialang.org/#/builders/63/builds/6775/steps/5/logs/stdio

      From worker 2:	Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
      From worker 2:	Exception: EXCEPTION_ACCESS_VIOLATION at 0x122be1ae3 -- __fu1368_SuiteSparse_config at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 2:	in expression starting at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\test\cholmod.jl:18
      From worker 2:	__fu1368_SuiteSparse_config at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
      From worker 2:	cholmod_l_norm_dense at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\src\solvers\lib\x86_64-w64-mingw32.jl:1276
      From worker 2:	unknown function (ip: 0000000068aaa635)
      From worker 2:	_jl_invoke at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2393 [inlined]
      From worker 2:	ijl_apply_generic at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2575
      From worker 2:	norm_dense at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\src\solvers\cholmod.jl:434
      From worker 2:	unknown function (ip: 0000000068aaa461)

ViralBShah avatar Jul 14 '22 12:07 ViralBShah

cholmod.jl uses the unsafe_load(pointer(X)) pattern. Is that something we should replace with something safer? This all suggests that something in one of the wrappers is passing too small a size to the C libraries leading to corruption in upredictable ways based on the order of execution.

@gnimuc

ViralBShah avatar Jul 14 '22 12:07 ViralBShah

X is a mutable struct that wraps a raw pointer Ptr. pointer(X) is defined as pointer(X) = Base.unsafe_convert(Ptr{XXX}, X) and Base.unsafe_convert is basically X.ptr. So, unsafe_load(pointer(X)) is basically unsafe_load(X.ptr), which is OK as long as X.ptr(allocated by the C library) is persistent.

Gnimuc avatar Jul 14 '22 13:07 Gnimuc