Kilosort icon indicating copy to clipboard operation
Kilosort copied to clipboard

BUG: <CUDA out of memory>

Open hemant22 opened this issue 1 year ago • 16 comments

Describe the issue:

Kilosort shows error during the Final clustering step. This happened thrice but with the same session (data). KS works fine on other sessions.

Reproduce the bug:

No response

Error message:

07-25 16:18 kilosort.gui.sorter INFO     Kilosort version 4.0.13
07-25 16:18 kilosort.gui.sorter INFO     Sorting Y:\Users\hsrivastava\BS\W5006\W5006_20240723_Loc1_g0_t0.imec0.ap.bin
07-25 16:18 kilosort.gui.sorter INFO     ----------------------------------------
07-25 16:18 kilosort.gui.sorter DEBUG    Initial ops:
{   'data_file_path': WindowsPath('Y:/Users/hsrivastava/BS/W5006/W5006_20240723_Loc1_g0_t0.imec0.ap.bin'),
    'results_dir': WindowsPath('Y:/Users/hsrivastava/BS/W5006/kilosort4'),
    'probe_name': 'neuropixPhase3B1_kilosortChanMap.mat',
    'data_dtype': 'int16',
    'n_chan_bin': 385,
    'fs': 30000.0,
    'batch_size': 60000,
    'nblocks': 1,
    'Th_universal': 9.0,
    'Th_learned': 8.0,
    'tmin': 0.0,
    'tmax': inf,
    'nt': 61,
    'shift': None,
    'scale': None,
    'artifact_threshold': inf,
    'nskip': 25,
    'whitening_range': 32,
    'highpass_cutoff': 300.0,
    'binning_depth': 5.0,
    'sig_interp': 20.0,
    'drift_smoothing': [0.5, 0.5, 0.5],
    'nt0min': 20,
    'dmin': None,
    'dminx': 32.0,
    'min_template_size': 10.0,
    'template_sizes': 5,
    'nearest_chans': 10,
    'nearest_templates': 100,
    'max_channel_distance': None,
    'templates_from_data': True,
    'n_templates': 6,
    'n_pcs': 6,
    'Th_single_ch': 6.0,
    'acg_threshold': 0.2,
    'ccg_threshold': 0.25,
    'cluster_downsampling': 20,
    'x_centers': None,
    'duplicate_spike_ms': 0.25,
    'save_preprocessed_copy': True,
    'data_dir': WindowsPath('Y:/Users/hsrivastava/BS/W5006'),
    'filename': WindowsPath('Y:/Users/hsrivastava/BS/W5006/W5006_20240723_Loc1_g0_t0.imec0.ap.bin'),
    'do_CAR': True,
    'invert_sign': False,
    'NTbuff': 60122,
    'Nchan': 383,
    'duplicate_spike_bins': 7,
    'torch_device': 'cuda',
    'xc': array([43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27.], dtype=float32),
    'yc': array([  20.,   20.,   40.,   40.,   60.,   60.,   80.,   80.,  100.,
        100.,  120.,  120.,  140.,  140.,  160.,  160.,  180.,  180.,
        200.,  200.,  220.,  220.,  240.,  240.,  260.,  260.,  280.,
        280.,  300.,  300.,  320.,  320.,  340.,  340.,  360.,  360.,
        380.,  380.,  400.,  400.,  420.,  420.,  440.,  440.,  460.,
        460.,  480.,  480.,  500.,  500.,  520.,  520.,  540.,  540.,
        560.,  560.,  580.,  580.,  600.,  600.,  620.,  620.,  640.,
        640.,  660.,  660.,  680.,  680.,  700.,  700.,  720.,  720.,
        740.,  740.,  760.,  760.,  780.,  780.,  800.,  800.,  820.,
        820.,  840.,  840.,  860.,  860.,  880.,  880.,  900.,  900.,
        920.,  920.,  940.,  940.,  960.,  960.,  980.,  980., 1000.,
       1000., 1020., 1020., 1040., 1040., 1060., 1060., 1080., 1080.,
       1100., 1100., 1120., 1120., 1140., 1140., 1160., 1160., 1180.,
       1180., 1200., 1200., 1220., 1220., 1240., 1240., 1260., 1260.,
       1280., 1280., 1300., 1300., 1320., 1320., 1340., 1340., 1360.,
       1360., 1380., 1380., 1400., 1400., 1420., 1420., 1440., 1440.,
       1460., 1460., 1480., 1480., 1500., 1500., 1520., 1520., 1540.,
       1540., 1560., 1560., 1580., 1580., 1600., 1600., 1620., 1620.,
       1640., 1640., 1660., 1660., 1680., 1680., 1700., 1700., 1720.,
       1720., 1740., 1740., 1760., 1760., 1780., 1780., 1800., 1800.,
       1820., 1820., 1840., 1840., 1860., 1860., 1880., 1880., 1900.,
       1900., 1920., 1940., 1940., 1960., 1960., 1980., 1980., 2000.,
       2000., 2020., 2020., 2040., 2040., 2060., 2060., 2080., 2080.,
       2100., 2100., 2120., 2120., 2140., 2140., 2160., 2160., 2180.,
       2180., 2200., 2200., 2220., 2220., 2240., 2240., 2260., 2260.,
       2280., 2280., 2300., 2300., 2320., 2320., 2340., 2340., 2360.,
       2360., 2380., 2380., 2400., 2400., 2420., 2420., 2440., 2440.,
       2460., 2460., 2480., 2480., 2500., 2500., 2520., 2520., 2540.,
       2540., 2560., 2560., 2580., 2580., 2600., 2600., 2620., 2620.,
       2640., 2640., 2660., 2660., 2680., 2680., 2700., 2700., 2720.,
       2720., 2740., 2740., 2760., 2760., 2780., 2780., 2800., 2800.,
       2820., 2820., 2840., 2840., 2860., 2860., 2880., 2880., 2900.,
       2900., 2920., 2920., 2940., 2940., 2960., 2960., 2980., 2980.,
       3000., 3000., 3020., 3020., 3040., 3040., 3060., 3060., 3080.,
       3080., 3100., 3100., 3120., 3120., 3140., 3140., 3160., 3160.,
       3180., 3180., 3200., 3200., 3220., 3220., 3240., 3240., 3260.,
       3260., 3280., 3280., 3300., 3300., 3320., 3320., 3340., 3340.,
       3360., 3360., 3380., 3380., 3400., 3400., 3420., 3420., 3440.,
       3440., 3460., 3460., 3480., 3480., 3500., 3500., 3520., 3520.,
       3540., 3540., 3560., 3560., 3580., 3580., 3600., 3600., 3620.,
       3620., 3640., 3640., 3660., 3660., 3680., 3680., 3700., 3700.,
       3720., 3720., 3740., 3740., 3760., 3760., 3780., 3780., 3800.,
       3800., 3820., 3820., 3840., 3840.], dtype=float32),
    'kcoords': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32),
    'chanMap': array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 192, 193, 194, 195,
       196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208,
       209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,
       222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234,
       235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247,
       248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260,
       261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273,
       274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286,
       287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
       300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312,
       313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325,
       326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338,
       339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351,
       352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364,
       365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377,
       378, 379, 380, 381, 382, 383]),
    'n_chan': 384}

07-25 16:18 kilosort.run_kilosort INFO      
07-25 16:18 kilosort.run_kilosort INFO     Computing preprocessing variables.
07-25 16:18 kilosort.run_kilosort INFO     ----------------------------------------
07-25 16:18 kilosort.run_kilosort INFO     N samples: 256380588
07-25 16:18 kilosort.run_kilosort INFO     N seconds: 8546.0196
07-25 16:18 kilosort.run_kilosort INFO     N batches: 4274
07-25 16:19 kilosort.run_kilosort INFO     Preprocessing filters computed in  9.34s; total  9.35s
07-25 16:19 kilosort.run_kilosort DEBUG    hp_filter shape: torch.Size([30122])
07-25 16:19 kilosort.run_kilosort DEBUG    whiten_mat shape: torch.Size([383, 383])
07-25 16:19 kilosort.run_kilosort INFO      
07-25 16:19 kilosort.run_kilosort INFO     Computing drift correction.
07-25 16:19 kilosort.run_kilosort INFO     ----------------------------------------
07-25 16:19 kilosort.spikedetect INFO     Re-computing universal templates from data.
07-25 17:43 kilosort.run_kilosort INFO     drift computed in  5078.85s; total  5088.20s
07-25 17:43 kilosort.run_kilosort DEBUG    st shape: (43130834, 6)
07-25 17:43 kilosort.run_kilosort DEBUG    yblk shape: (1,)
07-25 17:43 kilosort.run_kilosort DEBUG    dshift shape: (4274, 1)
07-25 17:43 kilosort.run_kilosort DEBUG    iKxx shape: torch.Size([383, 383])
07-25 17:43 kilosort.gui.sorter DEBUG    First batch min, max: (-36.98598, 59.449707)
07-25 18:13 kilosort.io  INFO      
07-25 18:13 kilosort.io  INFO     ========================================
07-25 18:13 kilosort.io  INFO     Saving drift-corrected copy of data to: Y:\Users\hsrivastava\BS\W5006\kilosort4\temp_wh.dat...
07-25 18:13 kilosort.io  INFO     Writing batch 0/4274...
07-25 18:16 kilosort.io  INFO     Writing batch 100/4274...
07-25 18:19 kilosort.io  INFO     Writing batch 200/4274...
07-25 18:23 kilosort.io  INFO     Writing batch 300/4274...
07-25 18:26 kilosort.io  INFO     Writing batch 400/4274...
07-25 18:29 kilosort.io  INFO     Writing batch 500/4274...
07-25 18:33 kilosort.io  INFO     Writing batch 600/4274...
07-25 18:36 kilosort.io  INFO     Writing batch 700/4274...
07-25 18:40 kilosort.io  INFO     Writing batch 800/4274...
07-25 18:43 kilosort.io  INFO     Writing batch 900/4274...
07-25 18:47 kilosort.io  INFO     Writing batch 1000/4274...
07-25 18:50 kilosort.io  INFO     Writing batch 1100/4274...
07-25 18:53 kilosort.io  INFO     Writing batch 1200/4274...
07-25 18:57 kilosort.io  INFO     Writing batch 1300/4274...
07-25 19:00 kilosort.io  INFO     Writing batch 1400/4274...
07-25 19:04 kilosort.io  INFO     Writing batch 1500/4274...
07-25 19:07 kilosort.io  INFO     Writing batch 1600/4274...
07-25 19:11 kilosort.io  INFO     Writing batch 1700/4274...
07-25 19:14 kilosort.io  INFO     Writing batch 1800/4274...
07-25 19:18 kilosort.io  INFO     Writing batch 1900/4274...
07-25 19:21 kilosort.io  INFO     Writing batch 2000/4274...
07-25 19:25 kilosort.io  INFO     Writing batch 2100/4274...
07-25 19:28 kilosort.io  INFO     Writing batch 2200/4274...
07-25 19:32 kilosort.io  INFO     Writing batch 2300/4274...
07-25 19:35 kilosort.io  INFO     Writing batch 2400/4274...
07-25 19:39 kilosort.io  INFO     Writing batch 2500/4274...
07-25 19:42 kilosort.io  INFO     Writing batch 2600/4274...
07-25 19:46 kilosort.io  INFO     Writing batch 2700/4274...
07-25 19:49 kilosort.io  INFO     Writing batch 2800/4274...
07-25 19:53 kilosort.io  INFO     Writing batch 2900/4274...
07-25 19:57 kilosort.io  INFO     Writing batch 3000/4274...
07-25 20:00 kilosort.io  INFO     Writing batch 3100/4274...
07-25 20:04 kilosort.io  INFO     Writing batch 3200/4274...
07-25 20:07 kilosort.io  INFO     Writing batch 3300/4274...
07-25 20:11 kilosort.io  INFO     Writing batch 3400/4274...
07-25 20:14 kilosort.io  INFO     Writing batch 3500/4274...
07-25 20:18 kilosort.io  INFO     Writing batch 3600/4274...
07-25 20:22 kilosort.io  INFO     Writing batch 3700/4274...
07-25 20:25 kilosort.io  INFO     Writing batch 3800/4274...
07-25 20:29 kilosort.io  INFO     Writing batch 3900/4274...
07-25 20:33 kilosort.io  INFO     Writing batch 4000/4274...
07-25 20:36 kilosort.io  INFO     Writing batch 4100/4274...
07-25 20:40 kilosort.io  INFO     Writing batch 4200/4274...
07-25 20:42 kilosort.io  INFO     ========================================
07-25 20:42 kilosort.io  INFO     Copying finished.
07-25 20:42 kilosort.io  INFO      
07-25 20:42 kilosort.run_kilosort INFO      
07-25 20:42 kilosort.run_kilosort INFO     Extracting spikes using templates
07-25 20:42 kilosort.run_kilosort INFO     ----------------------------------------
07-25 20:42 kilosort.spikedetect INFO     Re-computing universal templates from data.
07-25 23:00 kilosort.run_kilosort INFO     40724351 spikes extracted in  8260.94s; total  24100.91s
07-25 23:00 kilosort.run_kilosort DEBUG    st0 shape: (40724351, 6)
07-25 23:00 kilosort.run_kilosort DEBUG    tF shape: torch.Size([40724351, 10, 6])
07-25 23:00 kilosort.run_kilosort INFO      
07-25 23:00 kilosort.run_kilosort INFO     First clustering
07-25 23:00 kilosort.run_kilosort INFO     ----------------------------------------
07-26 00:43 kilosort.run_kilosort INFO     1916 clusters found, in  6176.53s; total  30277.48s
07-26 00:43 kilosort.run_kilosort DEBUG    clu shape: (40724351,)
07-26 00:43 kilosort.run_kilosort DEBUG    Wall shape: torch.Size([1916, 383, 6])
07-26 00:43 kilosort.run_kilosort INFO      
07-26 00:43 kilosort.run_kilosort INFO     Extracting spikes using cluster waveforms
07-26 00:43 kilosort.run_kilosort INFO     ----------------------------------------
07-26 03:46 kilosort.run_kilosort INFO     110152832 spikes extracted in  10996.00s; total  41273.50s
07-26 03:46 kilosort.run_kilosort DEBUG    st shape: (110152832, 3)
07-26 03:46 kilosort.run_kilosort DEBUG    tF shape: torch.Size([110152832, 10, 6])
07-26 03:46 kilosort.run_kilosort DEBUG    iCC shape: torch.Size([10, 383])
07-26 03:46 kilosort.run_kilosort DEBUG    iU shape: torch.Size([1558])
07-26 03:46 kilosort.run_kilosort INFO      
07-26 03:46 kilosort.run_kilosort INFO     Final clustering
07-26 03:46 kilosort.run_kilosort INFO     ----------------------------------------
07-26 04:05 kilosort.gui.sorter ERROR    Encountered error in `run_kilosort`:
Traceback (most recent call last):
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 124, in run
    clu, Wall = cluster_spikes(
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 563, in cluster_spikes
    clu, Wall = clustering_qr.run(ops, st, tF,  mode = 'template', device=device,
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 364, in run
    iclust, iclust0, M, iclust_init = cluster(Xd, nskip=nskip, lam=1,
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 151, in cluster
    iclust = assign_iclust(rows_neigh, isub, kn, tones2, nclust, lam, m, ki, kj, device=device)
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 81, in assign_iclust
    xN = xN - lam/m * (ki.unsqueeze(-1) * kN.to_dense())
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.61 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 4.26 GiB is allocated by PyTorch, and 814.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Version information:

Kilosort v4.0.13 GPU: Nvidia GeForce RTX 3070

hemant22 avatar Jul 26 '24 18:07 hemant22

Are you sure there weren't any other GPU-intensive processes running at the time? Based on that error message, this happened because around 4 gb of video memory was reserved for something else. I would try restarting the machine and sorting again as a first step, if you haven't done that yet.

jacobpennington avatar Jul 26 '24 23:07 jacobpennington

I also have this happen to me regularly. I have tried many different cuda versions and nvidia driver versions. Happy to provide any files you need, however the main file this happens on is a 70 gb file.

Lathomas42 avatar Jul 29 '24 17:07 Lathomas42

gpu_usage_after_loading_data gpu_usage_before_launching_kilosort_gui @jacobpennington I tried again after restarting the computer.. but it again stopped with the same error 'cuda out of memory'. Another session stopped at 'Saving to phy and computing refractory periods' with the same error. I have attached screenshot of GPU usage before and after launching kilosort gui @Lathomas42 Thanks for your help. Can you please share more details and the drivers that can help me

hemant22 avatar Aug 02 '24 19:08 hemant22

my .bin file size is about 200 gb

hemant22 avatar Aug 02 '24 19:08 hemant22

@hemant22 I still don't see anything to indicate that Kilosort is causing this, especially if you're getting the error at different points in the pipeline.

This error message: CUDA out of memory. Tried to allocate 1.61 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 4.26 GiB is allocated by PyTorch, and 814.23 MiB is reserved by PyTorch but unallocated.

Is saying: Kilosort is using currently ~4.3 GB of video memory. It tried to allocate an additional 1.6 GB, but couldn't do that because there was no more video memory available. The only reason that would happen is if something else is running on your machine that is using up that memory, or otherwise preventing pytorch from making use of it.

Windows task manager is also not a reliable way to gauge memory usage for pytorch. A better way to check is using the nvidia-smi command in a terminal / powershell.

jacobpennington avatar Aug 04 '24 17:08 jacobpennington

@jacobpennington No one else is running anything on the machine that might be using the memory. That's for sure. The error happens mostly at this line: - "vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)". I will run it again while monitoring the memory usage via nvidia-smi command.

Should I try this that is suggested with the error: " If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. "

hemant22 avatar Aug 04 '24 18:08 hemant22

Okay. Can you please also try sorting a subset of the data, say with tmax = 1800 (first 30 mins of data)? The number of spikes you're detecting seems much larger than expected for that size of recording, which might be why you're seeing this issue for this recording and not others. If you can sort a subset, that might reveal if there are some strange units or artifacts in the results that could be causing issues.

jacobpennington avatar Aug 06 '24 18:08 jacobpennington

@jacobpennington I tried running with tmax=1800. It ran successfully. So what should I do/check next to figure out the problem?

Copied below is the log file : 08-06 16:50 kilosort.run_kilosort INFO
08-06 16:50 kilosort.run_kilosort INFO Computing preprocessing variables. 08-06 16:50 kilosort.run_kilosort INFO ---------------------------------------- 08-06 16:50 kilosort.run_kilosort INFO Preprocessing filters computed in 2.04s; total 2.04s 08-06 16:50 kilosort.run_kilosort DEBUG hp_filter shape: torch.Size([30122]) 08-06 16:50 kilosort.run_kilosort DEBUG whiten_mat shape: torch.Size([383, 383]) 08-06 16:50 kilosort.run_kilosort INFO
08-06 16:50 kilosort.run_kilosort INFO Computing drift correction. 08-06 16:50 kilosort.run_kilosort INFO ---------------------------------------- 08-06 16:50 kilosort.spikedetect INFO Re-computing universal templates from data. 08-06 18:00 kilosort.run_kilosort INFO drift computed in 4227.40s; total 4229.44s 08-06 18:00 kilosort.run_kilosort DEBUG st shape: (10527619, 6) 08-06 18:00 kilosort.run_kilosort DEBUG yblk shape: (1,) 08-06 18:00 kilosort.run_kilosort DEBUG dshift shape: (900, 1) 08-06 18:00 kilosort.run_kilosort DEBUG iKxx shape: torch.Size([383, 383]) 08-06 18:00 kilosort.gui.sorter DEBUG First batch min, max: (-25.055134, 38.15639) 08-06 18:00 kilosort.run_kilosort INFO
08-06 18:00 kilosort.run_kilosort INFO Extracting spikes using templates 08-06 18:00 kilosort.run_kilosort INFO ---------------------------------------- 08-06 18:00 kilosort.spikedetect INFO Re-computing universal templates from data. 08-06 19:04 kilosort.run_kilosort INFO 9929708 spikes extracted in 3801.92s; total 8032.19s 08-06 19:04 kilosort.run_kilosort DEBUG st0 shape: (9929708, 6) 08-06 19:04 kilosort.run_kilosort DEBUG tF shape: torch.Size([9929708, 10, 6]) 08-06 19:04 kilosort.run_kilosort INFO
08-06 19:04 kilosort.run_kilosort INFO First clustering 08-06 19:04 kilosort.run_kilosort INFO ---------------------------------------- 08-06 19:07 kilosort.run_kilosort INFO 1533 clusters found, in 233.51s; total 8265.72s 08-06 19:07 kilosort.run_kilosort DEBUG clu shape: (9929708,) 08-06 19:07 kilosort.run_kilosort DEBUG Wall shape: torch.Size([1533, 383, 6]) 08-06 19:07 kilosort.run_kilosort INFO
08-06 19:07 kilosort.run_kilosort INFO Extracting spikes using cluster waveforms 08-06 19:07 kilosort.run_kilosort INFO ---------------------------------------- 08-06 19:25 kilosort.run_kilosort INFO 25892920 spikes extracted in 1065.07s; total 9330.80s 08-06 19:25 kilosort.run_kilosort DEBUG st shape: (25892920, 3) 08-06 19:25 kilosort.run_kilosort DEBUG tF shape: torch.Size([25892920, 10, 6]) 08-06 19:25 kilosort.run_kilosort DEBUG iCC shape: torch.Size([10, 383]) 08-06 19:25 kilosort.run_kilosort DEBUG iU shape: torch.Size([1200]) 08-06 19:25 kilosort.run_kilosort INFO
08-06 19:25 kilosort.run_kilosort INFO Final clustering 08-06 19:25 kilosort.run_kilosort INFO ---------------------------------------- 08-06 19:37 kilosort.run_kilosort INFO 1111 clusters found, in 710.35s; total 10041.17s 08-06 19:37 kilosort.run_kilosort DEBUG clu shape: (25892920,) 08-06 19:37 kilosort.run_kilosort DEBUG Wall shape: torch.Size([1111, 383, 6]) 08-06 19:37 kilosort.run_kilosort INFO
08-06 19:37 kilosort.run_kilosort INFO Merging clusters 08-06 19:37 kilosort.run_kilosort INFO ---------------------------------------- 08-06 19:38 kilosort.run_kilosort INFO 956 units found, in 43.23s; total 10084.42s 08-06 19:38 kilosort.run_kilosort DEBUG clu shape: (25892920,) 08-06 19:38 kilosort.run_kilosort DEBUG Wall shape: torch.Size([956, 383, 6]) 08-06 19:38 kilosort.run_kilosort INFO
08-06 19:38 kilosort.run_kilosort INFO Saving to phy and computing refractory periods 08-06 19:38 kilosort.run_kilosort INFO ---------------------------------------- 08-06 19:49 kilosort.run_kilosort INFO 417 units found with good refractory periods 08-06 19:49 kilosort.run_kilosort INFO Total runtime: 10774.82s = 02:59:35 h:m:s 08-06 19:49 kilosort.run_kilosort INFO Sorting output saved in: Z:\Users\Kyunghee\CN\Ephys\W5006\20240723\Loc1\Response\W5006_20240723_Loc1_g0\W5006_20240723_Loc1_g0_imec0\kilosort4.

hemant22 avatar Aug 07 '24 02:08 hemant22

@hemant22 Can you open the results in Phy and check if anything looks off with the waveforms or anything else? Screenshots from that would be helpful.

jacobpennington avatar Aug 07 '24 14:08 jacobpennington

@jacobpennington I checked KS output (for tmax=1800) in phy. I didn't find anything that is strange or different from the other sessions. Additionally, I was able to run full session on KS3 without any problems; so the raw data looks fine to me.

hemant22 avatar Aug 07 '24 22:08 hemant22

Okay, thanks. If you're comfortable modifying the code, can you please try the change in this pull request and see if you're able to sort the full recording? It just adds a couple lines to one file. https://github.com/MouseLand/Kilosort/pull/758/files

jacobpennington avatar Aug 07 '24 23:08 jacobpennington

Update to the previous comment: you no longer need to modify the code to try that. You can update to the latest version (v4.0.15) and use clear_cache=True.

jacobpennington avatar Aug 11 '24 03:08 jacobpennington

@jacobpennington Thank you for your help. It is working now. @Lathomas42 Thanks a lot.

hemant22 avatar Aug 12 '24 15:08 hemant22

Hi Jacob, @jacobpennington I used clear_cache=True with v4.0.15, but I'm still encountering the error torch.OutOfMemoryError: CUDA out of memory error. This occurs immediately after the first clustering. Could you examine the error and provide any idea of how to fix? Thank you!!

SpikeSortingError: Spike sorting error trace:
Traceback (most recent call last):
  File "C:\Users\Lenovo\.conda\envs\SI\spikeinterface\src\spikeinterface\sorters\basesorter.py", line 261, in run_from_folder
    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
  File "C:\Users\Lenovo\.conda\envs\SI\spikeinterface\src\spikeinterface\sorters\external\kilosort4.py", line 273, in _run_from_folder
    st, tF, _, _ = detect_spikes(ops, device, bfile, tic0=tic0, progress_bar=progress_bar)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\run_kilosort.py", line 611, in detect_spikes
    st, tF, ops = template_matching.extract(ops, bfile, Wall3, device=device,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\template_matching.py", line 26, in extract
    ctc = prepare_matching(ops, U)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\template_matching.py", line 108, in prepare_matching
    ctc = torch.einsum('ijkm, kml -> ijl', UtU, WtW)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\torch\functional.py", line 380, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 53.53 GiB. GPU 0 has a total capacity of 10.00 GiB of which 0 bytes is free. Of the allocated memory 31.35 GiB is allocated by PyTorch, and 43.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Hobart10 avatar Aug 20 '24 13:08 Hobart10

@Hobart10 Can you please provide a screenshot of what the KS4 GUI looks like when you load your data, and the kilosort4.log file from the results directory? The fact it's trying to allocate ~54 GB of video memory indicates there's some other problem causing this.

jacobpennington avatar Aug 20 '24 18:08 jacobpennington

Just found out it works in kilosort directly but not through spikeinterface in my case. Will consult there. Thank you!!

Hobart10 avatar Aug 21 '24 04:08 Hobart10

I still get OOM from this line (vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)) even if clear_cache=True.

RobertoDF avatar Sep 03 '24 15:09 RobertoDF

@RobertoDF Is that still the case? Just checking since you closed your pull requests.

jacobpennington avatar Sep 03 '24 22:09 jacobpennington

Hi! We are also getting a CUDA memory error, unfortunately none of the suggestions above or in the other related threads are solving it.

We have tried clearing the gpu cache, using the qr.kmeansplusplus version and even tried using an older version of KS but it always runs out of memory at the final clustering stage. We get the same error if we try on two different machines and when running on a hpc cluster - on the hpc, we used a GPU with ~18gb memory. The datasets which give us issues are large (~300gb) but we have successfully sorted larger datasets without issue. From looking at the data I don't see any obvious issues which might cause ks to fail (eg. the data is not noisy).

Do you have any other suggestions? Since we seem to need more memory, is it currently possible to run a single instance of kilosort4 across multiple GPUs at the same time?

EmmettJT avatar Feb 10 '25 10:02 EmmettJT

Is the problem arising specifically at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)? If you used my pull request at which line does it happen?

RobertoDF avatar Feb 10 '25 11:02 RobertoDF

yep, without your pull request error occurs at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)

when using it error is either at line 215: mu[j] = Xg[ix].mean(0) or sometimes at line 171 vtot = (Xg**2).sum(1)

Seems to be very similar to the issues Peyton-D mentioned here: https://github.com/MouseLand/Kilosort/pull/775

EmmettJT avatar Feb 10 '25 16:02 EmmettJT

Thanks! Working on a solution this week.

jacobpennington avatar Feb 12 '25 21:02 jacobpennington

Hi everyone and @jacobpennington ,

I've run into the same error. I tried to sort for the same NPX 2.0 probe and same channel map for 3 different recordings from 3 days. All of them ran into the same issue, all 2-3 hours long.

The channel map is particularly complicated as I wanted to cover cell layers from different brain regions. I run concatenation and preprocessing with spikeinterface and save as a binary file before running kilosort4 directly on the binary file.

Would it be easier to troubleshoot that I upload one of the recordings - I can share through OneDrive?

Image

Here is the error:

100%|█████████▉| 4292/4299 [1:07:08<00:07, 1.00s/it] 100%|█████████▉| 4293/4299 [1:07:09<00:05, 1.00it/s] 100%|█████████▉| 4294/4299 [1:07:10<00:04, 1.01it/s] 100%|█████████▉| 4295/4299 [1:07:11<00:03, 1.00it/s] 100%|█████████▉| 4296/4299 [1:07:12<00:02, 1.00it/s] 100%|█████████▉| 4297/4299 [1:07:13<00:01, 1.01it/s] 100%|█████████▉| 4298/4299 [1:07:14<00:00, 1.00it/s] 100%|██████████| 4299/4299 [1:07:15<00:00, 1.00s/it] 100%|██████████| 4299/4299 [1:07:15<00:00, 1.07it/s]

0%| | 0/110 [00:00<?, ?it/s] 1%| | 1/110 [01:33<2:49:18, 93.19s/it] 2%|▏ | 2/110 [01:36<1:12:04, 40.04s/it] 3%|▎ | 3/110 [01:39<41:33, 23.30s/it]
4%|▎ | 4/110 [01:39<24:56, 14.12s/it] 5%|▍ | 5/110 [01:39<15:49, 9.04s/it] 5%|▌ | 6/110 [01:39<10:22, 5.98s/it] 6%|▋ | 7/110 [01:39<06:56, 4.04s/it] 7%|▋ | 8/110 [01:40<04:57, 2.92s/it] 8%|▊ | 9/110 [01:40<03:42, 2.20s/it] 9%|▉ | 10/110 [01:42<03:22, 2.03s/it] 10%|█ | 11/110 [01:45<04:07, 2.50s/it] 11%|█ | 12/110 [01:49<04:43, 2.89s/it] 12%|█▏ | 13/110 [01:56<06:48, 4.21s/it] 13%|█▎ | 14/110 [01:59<05:46, 3.61s/it] 14%|█▎ | 15/110 [02:59<32:33, 20.57s/it] 15%|█▍ | 16/110 [03:18<31:41, 20.23s/it] 15%|█▌ | 17/110 [06:03<1:38:56, 63.83s/it] 16%|█▋ | 18/110 [07:35<1:50:54, 72.33s/it] 16%|█▋ | 18/110 [20:07<1:42:51, 67.08s/it] Traceback (most recent call last): File "/home/saleem_lab/spikeinterface_sorting/temp_data/20250212/probe1/sorters/in_container_sorter_script.py", line 23, in sorting = run_sorter_local( ^^^^^^^^^^^^^^^^^ File "/root/.local/lib/python3.11/site-packages/spikeinterface/sorters/runsorter.py", line 261, in run_sorter_local SorterClass.run_from_folder(folder, raise_error, verbose) File "/root/.local/lib/python3.11/site-packages/spikeinterface/sorters/basesorter.py", line 302, in run_from_folder raise SpikeSortingError( spikeinterface.sorters.utils.misc.SpikeSortingError: Spike sorting error trace: Traceback (most recent call last): File "/root/.local/lib/python3.11/site-packages/spikeinterface/sorters/basesorter.py", line 262, in run_from_folder SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose) File "/root/.local/lib/python3.11/site-packages/spikeinterface/sorters/external/kilosort4.py", line 366, in _run_from_folder st, tF, _, _ = detect_spikes( ^^^^^^^^^^^^^^ File "/home/miniconda3/lib/python3.11/site-packages/kilosort/run_kilosort.py", line 608, in detect_spikes clu, Wall = clustering_qr.run( ^^^^^^^^^^^^^^^^^^ File "/home/miniconda3/lib/python3.11/site-packages/kilosort/clustering_qr.py", line 393, in run iclust, iclust0, M, iclust_init = cluster(Xd, nskip=nskip, lam=1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/miniconda3/lib/python3.11/site-packages/kilosort/clustering_qr.py", line 147, in cluster iclust_init = kmeans_plusplus(Xg, niter = nclust, seed = seed, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/miniconda3/lib/python3.11/site-packages/kilosort/clustering_qr.py", line 206, in kmeans_plusplus vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1) ~~^~~~ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.34 GiB. GPU 0 has a total capacity of 10.75 GiB of which 1.47 GiB is free. Process 3029069 has 228.00 MiB memory in use. Process 4171570 has 9.05 GiB memory in use. Of the allocated memory 4.48 GiB is allocated by PyTorch, and 4.34 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

atombysx avatar Feb 17 '25 19:02 atombysx

@atombysx I'm working on a fix for this, data sharing shouldn't be necessary since I already have a few data sets from people with the same issue. I'll post another reply here after pushing the changes, and we can go from there if you still get out-of-memory errors with the new version.

jacobpennington avatar Feb 18 '25 20:02 jacobpennington

@atombysx One thing would be helpful in the short-term, though: can you please upload kilosort4.log from the results directory?

jacobpennington avatar Feb 18 '25 20:02 jacobpennington

@EmmettJT Have you tried updating to the latest version of Kilosort4? The errors you're seeing are a bit different, and from the log you posted it looks like you're using an older version. Some of the changes in the last several versions affect memory usage.

jacobpennington avatar Feb 19 '25 01:02 jacobpennington

I tried a few versions up to and including 4.0.21. I have just upgraded to 4.0.24 - I'll try a run now and let you know if it makes a difference!

*edit 20/02/25: No difference with the most recent version - same error

EmmettJT avatar Feb 19 '25 06:02 EmmettJT

@atombysx One thing would be helpful in the short-term, though: can you please upload kilosort4.log from the results directory?

Hi @jacobpennington sorry, I ran the processes on spikeinterface with docker version and only got the following saved if the process failed:

{ "sorter_name": "kilosort4", "sorter_version": "4.0.18", "datetime": "2025-02-17T15:08:39.366915", "runtime_trace": [], "error": true, "error_trace": [ "Traceback (most recent call last):", " File "/root/.local/lib/python3.11/site-packages/spikeinterface/sorters/basesorter.py", line 262, in run_from_folder", " SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)", " File "/root/.local/lib/python3.11/site-packages/spikeinterface/sorters/external/kilosort4.py", line 366, in _run_from_folder", " st, tF, _, _ = detect_spikes(", " ^^^^^^^^^^^^^^", " File "/home/miniconda3/lib/python3.11/site-packages/kilosort/run_kilosort.py", line 608, in detect_spikes", " clu, Wall = clustering_qr.run(", " ^^^^^^^^^^^^^^^^^^", " File "/home/miniconda3/lib/python3.11/site-packages/kilosort/clustering_qr.py", line 393, in run", " iclust, iclust0, M, iclust_init = cluster(Xd, nskip=nskip, lam=1,", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^", " File "/home/miniconda3/lib/python3.11/site-packages/kilosort/clustering_qr.py", line 147, in cluster", " iclust_init = kmeans_plusplus(Xg, niter = nclust, seed = seed, device=device)", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^", " File "/home/miniconda3/lib/python3.11/site-packages/kilosort/clustering_qr.py", line 206, in kmeans_plusplus", " vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)", " ~~^~~~", " torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.34 GiB. GPU 0 has a total capacity of 10.75 GiB of which 1.47 GiB is free. Process 3029069 has 228.00 MiB memory in use. Process 4171570 has 9.05 GiB memory in use. Of the allocated memory 4.48 GiB is allocated by PyTorch, and 4.34 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)" ], "run_time": null }

atombysx avatar Feb 24 '25 11:02 atombysx

@atombysx Please try uprading to the latest version of kilosort (v4.0.25) and running it without spikeinterface. I can't debug this without the details provided in the log, and there have been several changes in the past few versions that can affect memory usage.

jacobpennington avatar Feb 25 '25 18:02 jacobpennington

For anyone still encountering this: please try updating to the latest release (v4.0.26). It includes several bug fixes that improve memory usage during clustering. You can see my comment at the bottom of #775 if you'd like more explanation.

jacobpennington avatar Feb 27 '25 21:02 jacobpennington