Kilosort
Kilosort copied to clipboard
Segmentation fault on splitting cluster
I'm having the same problem as issue #85 although that one is closed.
I printed mean(diag(StS))
in every iteration as suggeested and what I see is the following
ans =
1×1 single gpuArray
1.3212
ans =
1×1 single gpuArray
0.8612
ans =
1×1 single gpuArray
9.0286
ans =
1×1 single gpuArray
NaN
I was using ops.minfr_goodchannels = 0.2;
In another run I changed that option as mentioned in #85 to ops.minfr_goodchannels = 2.0;
ans =
1x1 single gpuArray
1.4297
ans =
1x1 single gpuArray
1.4946
ans =
1x1 single gpuArray
1.8152
ans =
1x1 single gpuArray
7.4107
ans =
1x1 single gpuArray
1.1841e+03
ans =
1x1 single gpuArray
47210988
ans =
1x1 single gpuArray
NaN
Strangely, for this last configuration I ran the script step by step and it did not fail.
I ran it several time printing k
and ik
indexes to see if it fails at the same iteration but it does not. Sometimes it finish ok, some other times fails and it can be in any iteration. In this runs the output increases heavily but in others runs it just jumps from around 1 to NaN in the next iteration.
I cannot find any clue to the problem.
What I can see, as mentioned in #85 higher value in ops.minfr_goodchannels
reduced the failure probability. I'm using 2.0 now, although at the end there are some channels that had only one unit and with firing rate less than 0.1.
That's the firing rate per channel, so it is possible to get units with lower firing rates.
Can you try to add a small value to the diagonal of StS in the matrix inversion? Something like StS + 1e-3 * eye(size(StS)).
I was struggling with the same issue. Adding the small offset to the StS did solve it, thank you Marius.
After updating Kilosort2 with all the commits from May I was able to sort most of the datasets that I had trouble before, without needing to change anything, using ops.minfr_goodchannels = 0.2
I still have a few datasets that fails with this error.
For those ones tried changing this line https://github.com/MouseLand/Kilosort2/blob/baf5e64570ddcf27d473dc503eceb618259d0a33/postProcess/splitAllClusters.m#L106 for
w = StMu'/(StS+1e-3 * eye(size(StS)))
but this alone did not solve it when using ops.minfr_goodchannels = 0.2
Combined with ops.minfr_goodchannels = 1.0
is working
I'm having this issue as well. Is there any potential side effect to adding an offset to the StS
matrix? If not, is the algorithm going to be updated to catch this error?
@yagui minfr_goodchannels = 1 is a little too large, and might throw out a lot of channels.
Try the latest version to see if it was solved, there were some cases where clusters with no spikes potentially made it to this step.
@marius10p Just wanted to report that I got this error even in a newer release of Kilosort2.5, which I cloned on Nov. 13, 2020, so it appears it has not been fully resolved.
Using ops.minfr_goodchannels = 1
fixes the issue for me. Notably, even changing the appropriate line in splitAllClusters.m
to w = StMu'/(StS+1e-3 * eye(size(StS)))
, as suggested, does not resolve it when ops.minfr_goodchannels = 0.1
.
Update for other folks with this issue:
After more debugging, I realized this happened only on Linux (Ubuntu 18.04), not Windows. Everyone else with this bug also appears to be a Linux user. The problem appears to be the version of mkl that ships with Matlab for Linux (2019b at least). Running version -blas
and version -lapack
in the Matlab command line revealed that it was using versions from 2018. Following Eric Wu's suggestion in #85, I installed the most recent version of mkl from Intel and set the paths by source
ing /opt/intel/oneapi/setvars.sh
in my bashrc
file . I confirmed that the versions of blas and lapack had subsequently changed in Matlab. This resolves the issue for me, and hopefully will resolve it for at least one other person out there. No fiddling with minfr_goodchannels
or StS
required.
Thanks for posting this solution @alowet. The Python version is almost ready...
Thanks, @alowet . This worked for me. In fact, I didn't want to install the full OneAPI toolkit, but found that I could achieve the same result by either using the latest version of MKL in the offical Ubuntu repos (link), or by just installing the OneAPI runtime libraries (link). Both these options lack a setvars.sh
script and I don't know what that contains, but simply setting the BLAS_VERSION
and LAPACK_VERSION
environment variables worked fine. For instance, if using mkl from the Ubuntu repos: setenv('BLAS_VERSION', '/usr/lib/x64_86-linux-gnu/mkl/liblas.so'); setenv('LAPACK_VERSION', '/usr/lib/x64_86-linux-gnu/mkl/liblapack.so')
Thank you!
@marius10p Will the Python port initially reflect KS v2.0, or is KS v2.5 also coming soon? Many thanks.
Thanks, @alowet . This worked for me. In fact, I didn't want to install the full OneAPI toolkit, but found that I could achieve the same result by either using the latest version of MKL in the offical Ubuntu repos (link), or by just installing the OneAPI runtime libraries (link). Both these options lack a
setvars.sh
script and I don't know what that contains, but simply setting theBLAS_VERSION
andLAPACK_VERSION
environment variables worked fine. For instance, if using mkl from the Ubuntu repos:setenv('BLAS_VERSION', '/usr/lib/x64_86-linux-gnu/mkl/liblas.so'); setenv('LAPACK_VERSION', '/usr/lib/x64_86-linux-gnu/mkl/liblapack.so')
Thank you!
Note there's a typo in the paths which don't raise warnings but lead to the following error:
--------------------------------------------------------------------------------Warning: Escaped character '\' is not valid. See 'doc sprintf' for supported special characters.
Instead read:
setenv('BLAS_VERSION', '/usr/lib/x86_64-linux-gnu/mkl/libblas.so'); setenv('LAPACK_VERSION', '/usr/lib/x86_64-linux-gnu/mkl/liblapack.so')
@grahamfindlay 's solution works for me as well (no segfault during splitting, ks2.5 finishes properly), but I get a number of warnings like this one during the main loop:
Time 3499s. Optimizing templates ...
3500.07 sec, 1 / 1285 batches, 49 units, nspks: 16.7069, mu: 10.0000, nst0: 124, merges: 1.3000, 0.0000, 3.0000
3500.07 sec, 1 / 1285 batches, 49 units, nspks: 16.7069, mu: 10.0000, nst0: 124, merges: 1.3000, 0.0000, 3.0000
Warning: Error updating Axes.
Cannot untransform a point, matrix may be invalid or singular
> In defaulterrorcallback (line 12)
In title (line 54)
In make_fig (line 26)
In learnTemplates (line 258)
In learnAndSolve8b (line 35)
In kilosort2_5_master (line 36)
Warning: Error updating Axes.
Cannot untransform a point, matrix may be invalid or singular
Ubuntu 20.04, matlab R2019b blas version: 'Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications, CNR branch unknown' lapack version: 'Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications, CNR branch unknown Linear Algebra PACKage Version 3.9.0'
Hi, I think I have the same issue as the one many of you mentioned above. I am also working on Ubuntu 18.04 and Matlab 2019b. I did not succeed when installing the new version of IntelOneAPi Toolkit to tell Matlab to use this new version (even when adding the line "source /opt/intel/oneapi/setvars.sh" in my bashrc So in the end, the --version blas in Matlab is still the 2018.
And I still have the following error during the splitting procedure:
Segmentation violation detected at jeu. déc. 09 17:10:52 2021 +0100
Configuration: Crash Decoding : Disabled - No sandbox or build area path Crash Mode : continue (default) Default Encoding : UTF-8 Deployed : false Desktop Environment : ubuntu:GNOME GNU C Library : 2.27 stable Graphics Driver : NVIDIA Corporation GeForce GTX 1080 Ti/PCIe/SSE2 Version 4.6.0 NVIDIA 418.39 Graphics card 1 : 0x10de ( 0x10de ) 0x1b06 Version 418.39.0.0 (0-0-0) Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode MATLAB Architecture : glnxa64 MATLAB Entitlement ID : 790777 MATLAB Root : /home/dprn/MATLAB MATLAB Version : 9.7.0.1737446 (R2019b) Update 9 OpenGL : hardware Operating System : Ubuntu 18.04.6 LTS Process ID : 27563 Processor ID : x86 Family 6 Model 85 Stepping 4, GenuineIntel Session Key : d5c7d129-39dd-4f9d-a7d3-4e1c1f8bd748 Static TLS mitigation : Enabled: Full Window System : The X.Org Foundation (11906000), display :1
Fault Count: 1
Abnormal termination: Segmentation violation
Register State (from fault): RAX = 000000007c425e1a RBX = 0000000000000180 RCX = 00007fe8a857aee8 RDX = 000000000000000f RSP = 00007fed26edc3c0 RBP = 0000000000000002 RSI = 000000000000003c RDI = 000000007c425e19
R8 = 0000000000007fff R9 = 00007fe6b74e3884 R10 = 00007fe6b72fec90 R11 = 000000007c425df8 R12 = 0000000000000004 R13 = 0000000000000001 R14 = ffffffffffffffff R15 = 00007fe6b74e3680
RIP = 00007fe877efa6c7 EFL = 0000000000010202
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x00007fe877efa6c7 /home/dprn/MATLAB/bin/glnxa64/mkl.so+10729159 mkl_lapack_sgetf2+00000279
[ 1] 0x00007fe87826969b /home/dprn/MATLAB/bin/glnxa64/mkl.so+14329499 mkl_lapack_sgetrf_local+00001563
[ 2] 0x00007fe878269389 /home/dprn/MATLAB/bin/glnxa64/mkl.so+14328713 mkl_lapack_sgetrf_local+00000777
[ 3] 0x00007fe878269389 /home/dprn/MATLAB/bin/glnxa64/mkl.so+14328713 mkl_lapack_sgetrf_local+00000777
[ 4] 0x00007fe8779f3d69 /home/dprn/MATLAB/bin/glnxa64/mkl.so+05459305 mkl_lapack_sgetrf_int+00008217
[ 5] 0x00007fe877929eae /home/dprn/MATLAB/bin/glnxa64/mkl.so+04632238 mkl_lapack_sgetrf+00000718
[ 6] 0x00007fe8778cb9ca /home/dprn/MATLAB/bin/glnxa64/mkl.so+04245962 sgetrf+00000154
[ 7] 0x00007fe88f8cf90d /home/dprn/MATLAB/bin/glnxa64/libmwmagma.so+01628429 magma_sgetrf_gpu+00002877
[ 8] 0x00007fe8b6e1d599 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+08418713
[ 9] 0x00007fe8b6c5b7f2 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+06576114
[ 10] 0x00007fe8b6c5cabc /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+06580924
[ 11] 0x00007fe8b6e0c1a0 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+08348064
[ 12] 0x00007fe8b6c65ff8 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+06619128
[ 13] 0x00007fe8b6c66652 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+06620754
[ 14] 0x00007fe8b6c67a7a /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+06625914
[ 15] 0x00007fe8b6f049d7 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+09365975
[ 16] 0x00007fe8b6f04ee7 /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+09367271
[ 17] 0x00007fe8b6a46c0c /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+04393996
[ 18] 0x00007fe8b692809d /home/dprn/MATLAB/bin/glnxa64/libmwgpu.so+03219613
[ 19] 0x00007fed2510d5c8 /home/dprn/MATLAB/bin/glnxa64/libmwmcos_impl.so+04040136
[ 20] 0x00007fed25110948 /home/dprn/MATLAB/bin/glnxa64/libmwmcos_impl.so+04053320
[ 21] 0x00007fed25118942 /home/dprn/MATLAB/bin/glnxa64/libmwmcos_impl.so+04086082
[ 22] 0x00007fed2510c7d1 /home/dprn/MATLAB/bin/glnxa64/libmwmcos_impl.so+04036561
[ 23] 0x00007fed2529965f /home/dprn/MATLAB/bin/glnxa64/libmwmcos_impl.so+05662303
[ 24] 0x00007fed25292ab7 /home/dprn/MATLAB/bin/glnxa64/libmwmcos_impl.so+05634743
[ 25] 0x00007fed2fae1f86 /home/dprn/MATLAB/bin/glnxa64/libmwm_dispatcher.so+00561030
[ 26] 0x00007fed2fae22d1 /home/dprn/MATLAB/bin/glnxa64/libmwm_dispatcher.so+00561873 ZN18Mfh_MATLAB_fn_impl8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6detail17mxDestroy_deleterEEiPPS1+00000033
[ 27] 0x00007fed2d0e8a63 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14006883
[ 28] 0x00007fed2d0ed816 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14026774
[ 29] 0x00007fed2d1f483e /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+15104062
[ 30] 0x00007fed2d1e7b91 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+15051665
[ 31] 0x00007fed2d1e7c48 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+15051848
[ 32] 0x00007fed2d1547b4 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14448564
[ 33] 0x00007fed2d17bb4d /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14609229
[ 34] 0x00007fed2ca4d330 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+07078704
[ 35] 0x00007fed2ca35942 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+06981954
[ 36] 0x00007fed2ca39e23 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+06999587
[ 37] 0x00007fed2cfa4891 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+12679313
[ 38] 0x00007fed2d0d5fe1 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+13930465
[ 39] 0x00007fed2d0d70dd /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+13934813
[ 40] 0x00007fed2fb5f15f /home/dprn/MATLAB/bin/glnxa64/libmwm_dispatcher.so+01073503 ZN8Mfh_file20dispatch_file_commonEMS_FviPP11mxArray_tagiS2_EiS2_iS2+00000207
[ 41] 0x00007fed2fb60c5e /home/dprn/MATLAB/bin/glnxa64/libmwm_dispatcher.so+01080414
[ 42] 0x00007fed2fb611a1 /home/dprn/MATLAB/bin/glnxa64/libmwm_dispatcher.so+01081761 ZN8Mfh_file8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6detail17mxDestroy_deleterEEiPPS1+00000033
[ 43] 0x00007fed2d0e8a63 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14006883
[ 44] 0x00007fed2d0ed816 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14026774
[ 45] 0x00007fed2d1f483e /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+15104062
[ 46] 0x00007fed2d1e7cec /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+15052012
[ 47] 0x00007fed2d1547b4 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14448564
[ 48] 0x00007fed2d17adfd /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+14605821
[ 49] 0x00007fed2c8fe2db /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05706459
[ 50] 0x00007fed2c900514 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05715220
[ 51] 0x00007fed2c8fd2bd /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05702333
[ 52] 0x00007fed2c8ea791 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05625745
[ 53] 0x00007fed2c8ea9c9 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05626313
[ 54] 0x00007fed2c8fcac6 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05700294
[ 55] 0x00007fed2c8fcbc6 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+05700550
[ 56] 0x00007fed2ca366e9 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+06985449
[ 57] 0x00007fed2ca39e23 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+06999587
[ 58] 0x00007fed2cfa4891 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+12679313
[ 59] 0x00007fed2cf53063 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+12345443
[ 60] 0x00007fed2cf570af /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+12361903
[ 61] 0x00007fed2cf5a0e2 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+12374242
[ 62] 0x00007fed2cff372f /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+13002543
[ 63] 0x00007fed2cff3a19 /home/dprn/MATLAB/bin/glnxa64/libmwm_lxe.so+13003289
[ 64] 0x00007fed2ed1a4c4 /home/dprn/MATLAB/bin/glnxa64/libmwbridge.so+00341188 _Z8mnParserv+00000596
[ 65] 0x00007fed2fcb45b5 /home/dprn/MATLAB/bin/glnxa64/libmwmcr.so+01017269
[ 66] 0x00007fed447e942b /home/dprn/MATLAB/bin/glnxa64/libmwmvm.so+03097643 ZN14cmddistributor15PackagedTaskIIP10invokeFuncIN7mwboost8functionIFvvEEEEENS2_10shared_ptrINS2_13unique_futureIDTclfp_EEEEEERKT+00000059
[ 67] 0x00007fed447e9518 /home/dprn/MATLAB/bin/glnxa64/libmwmvm.so+03097880 _ZNSt17_Function_handlerIFN7mwboost3anyEvEZN14cmddistributor15PackagedTaskIIP10createFuncINS0_8functionIFvvEEEEESt8functionIS2_ET_EUlvE_E9_M_invokeERKSt9_Any_data+00000024
[ 68] 0x00007fed2fe0489c /home/dprn/MATLAB/bin/glnxa64/libmwiqm.so+00751772 _ZN7mwboost6detail8function21function_obj_invoker0ISt8functionIFNS_3anyEvEES4_E6invokeERNS1_15function_bufferE+00000028
[ 69] 0x00007fed2fe04557 /home/dprn/MATLAB/bin/glnxa64/libmwiqm.so+00750935 _ZN3iqm18PackagedTaskPlugin7executeEP15inWorkSpace_tag+00000439
[ 70] 0x00007fed2fca3015 /home/dprn/MATLAB/bin/glnxa64/libmwmcr.so+00946197
[ 71] 0x00007fed2fde96a0 /home/dprn/MATLAB/bin/glnxa64/libmwiqm.so+00640672
[ 72] 0x00007fed2fdcde01 /home/dprn/MATLAB/bin/glnxa64/libmwiqm.so+00527873
[ 73] 0x00007fed2fdcea7f /home/dprn/MATLAB/bin/glnxa64/libmwiqm.so+00531071
[ 74] 0x00007fed2fc8a575 /home/dprn/MATLAB/bin/glnxa64/libmwmcr.so+00845173
[ 75] 0x00007fed2fc8ab93 /home/dprn/MATLAB/bin/glnxa64/libmwmcr.so+00846739
[ 76] 0x00007fed2fc8b404 /home/dprn/MATLAB/bin/glnxa64/libmwmcr.so+00848900
[ 77] 0x00007fed42ac6bdd /home/dprn/MATLAB/bin/glnxa64/libmwboost_thread.so.1.65.1+00080861
[ 78] 0x00007fed43a8b6db /lib/x86_64-linux-gnu/libpthread.so.0+00030427
[ 79] 0x00007fed4321c71f /lib/x86_64-linux-gnu/libc.so.6+01185567 clone+00000063
[ 80] 0x0000000000000000
Could anyone tell me what I could test next to make it work? Thank you very much!