SparseArrays.jl
SparseArrays.jl copied to clipboard
Zero out all values and pointers before passing to CHOLMOD
Just like https://github.com/JuliaSparse/SparseArrays.jl/pull/183, we should ensure CHOLMOD wrappers also zero out everything.
Some examples of the cholmod crashes: https://build.julialang.org/#/builders/63/builds/6826/steps/5/logs/stdio
From worker 3: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
From worker 3: Exception: EXCEPTION_ACCESS_VIOLATION at 0xe1a03914 -- .text at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 3: in expression starting at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\test\cholmod.jl:18
From worker 3: .text at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 3: read_triplet at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 3: cholmod_l_read_triplet at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 3: cholmod_l_read_sparse at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 3: cholmod_l_read_sparse at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\src\solvers\lib\x86_64-w64-mingw32.jl:932
https://build.julialang.org/#/builders/63/builds/6775/steps/5/logs/stdio
From worker 2: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
From worker 2: Exception: EXCEPTION_ACCESS_VIOLATION at 0x122be1ae3 -- __fu1368_SuiteSparse_config at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 2: in expression starting at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\test\cholmod.jl:18
From worker 2: __fu1368_SuiteSparse_config at C:\buildbot\worker-tabularasa\tester_win64\build\bin\libcholmod.DLL (unknown line)
From worker 2: cholmod_l_norm_dense at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\src\solvers\lib\x86_64-w64-mingw32.jl:1276
From worker 2: unknown function (ip: 0000000068aaa635)
From worker 2: _jl_invoke at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2393 [inlined]
From worker 2: ijl_apply_generic at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2575
From worker 2: norm_dense at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.9\SparseArrays\src\solvers\cholmod.jl:434
From worker 2: unknown function (ip: 0000000068aaa461)
cholmod.jl uses the unsafe_load(pointer(X)) pattern. Is that something we should replace with something safer? This all suggests that something in one of the wrappers is passing too small a size to the C libraries leading to corruption in upredictable ways based on the order of execution.
@gnimuc
X is a mutable struct that wraps a raw pointer Ptr. pointer(X) is defined as pointer(X) = Base.unsafe_convert(Ptr{XXX}, X) and Base.unsafe_convert is basically X.ptr. So, unsafe_load(pointer(X)) is basically unsafe_load(X.ptr), which is OK as long as X.ptr(allocated by the C library) is persistent.