Out of memory during dredge correction
Hi all, I encountered an out-of-memory error while running dredge correction on a ~5-hour recording. My machine has 128 GB of RAM, and I’ve successfully run dredge on even longer recordings before without issues. Do you have any suggestions for how to resolve this? Here’s the error message I received: MemoryError Traceback (most recent call last) Cell In[2], line 7 5 if folder.exists(): 6 shutil.rmtree(folder) ----> 7 corrected_rec, motion_info = si.correct_motion(recording=rec3, preset=preset, folder=folder, output_motion_info=True, **job_kwargs) 8 motion_info = si.load_motion_info(folder) 9 fig = plt.figure(figsize=(14, 8))
File c:\Users\admin.conda\envs\fob_kilosort\lib\site-packages\spikeinterface\preprocessing\motion.py:436, in correct_motion(recording, preset, folder, output_motion, output_motion_info, overwrite, detect_kwargs, select_kwargs, localize_peaks_kwargs, estimate_motion_kwargs, interpolate_motion_kwargs, **job_kwargs) 429 run_times = dict( 430 detect_peaks=t1 - t0, 431 select_peaks=t2 - t1, 432 localize_peaks=t3 - t2, 433 ) 435 t0 = time.perf_counter() --> 436 motion = estimate_motion(recording, peaks, peak_locations, progress_bar=progress_bar, **estimate_motion_kwargs) 437 t1 = time.perf_counter() 438 run_times["estimate_motion"] = t1 - t0
File c:\Users\admin.conda\envs\fob_kilosort\lib\site-packages\spikeinterface\sortingcomponents\motion\motion_estimation.py:111, in estimate_motion(recording, peaks, peak_locations, direction, rigid, win_shape, win_step_um, win_scale_um, win_margin_um, method, extra_outputs, progress_bar, verbose, margin_um, **method_kwargs) 108 extra = None 110 # run method --> 111 motion = method_class.run( 112 recording, ... --> 719 negHUb = -Ub - Ub.T 720 diagonal_terms = np.diagonal(negHUb) + Ub.sum(1) + Ub.sum(0) 721 if Ub_prevcur is None:
MemoryError: Unable to allocate 3.12 GiB for an array with shape (20448, 20448) and data type float64
@samuelgarcia and maybe @cwindolf any ideas here?
Myabe it would be good to have all internal matrices in dredge as float32 don't you think @samuelgarcia ?
@yger that could be good! There are some problems where double precision seems helpful for the linear algebra, but it could work.
Alternatively @liyipeng-moon if you can tolerate using 2s time bins instead of 1s time bins for motion estimation, then it should be alright. (For example, KS uses 2.18s motion bins usually, so likely this is fine!)
Generally, I've been meaning to fix this. I think that the best solution is to take advantage of the time horizon parameter and instantiate a sparse representation of a band mattrix, which would lead to savings in this case. Then I would need to use a band matrix solver internally for the linear solve. I had tried this previously and found that it was slower than the dense solver, but maybe for large problems like this it's worthwhile. Or, maybe the default should just be 2s bins XD
Thank you for your suggestion @cwindolf , this error occurred with 4s bins. For my recording with non-rapid drift, you're right that larger bins should work better (also see https://github.com/SpikeInterface/spikeinterface/issues/4066). I'll update the bin size and try again. I appreciate the help!
Thanks @liyipeng-moon ! I just wanted to double check that -- above, it mentions a matrix of shape 20448, and 20448/60/60=5.68, so that would make sense for 1s bins in a 5ish hour recording. Would you be able to confirm that the estimate_motion parameter bin_s was not set to 1s?
Btw, looking at the error I noticed that my dredge code makes a lot of unnecessary copies of arrays and is not the most memory efficient. Opened https://github.com/SpikeInterface/spikeinterface/pull/4119 which would reduce that @samuelgarcia
THanks Charlie for the fix. Did you also force the float32 mention bby Pierre ?
Hi @cwindolf, thank you for the clarification. You are absolutely right. I realized I was adjusting the chunk_duration instead of bin_s, which explains why the matrix shape remained at 20448 regardless of my changes. I will adjust the bin_s parameter and test again.