spikeinterface Issue saving large preprocessed recording to binary file

Hello! I'm a new user of Spike Interface, currently setting a pipeline for processing Neuropixels 1.0 data. I'm experiencing an issue related to saving the preprocess recording (phase shift correction, highpass filtering and common reference applied to raw data) before running Kilosort. I didn't have a problem with a smaller dataset, but in this 2.5 hours recording, the process seems stuck: it doesn't give an error message, and the operation is not completed after several hours.

Here is the code:

ap_raw = se.read_spikeglx(folder_path=base_folder, stream_id ='imec0.ap')
ap_process = ap_raw
ap_process = spre.phase_shift(ap_process)
ap_process = spre.highpass_filter(ap_process)
ap_process = spre.common_reference(ap_process, reference='global', operator='median')

job_kwargs = dict(n_jobs=-1, chunk_duration="1s", progres_bar=True)
recording_saved = ap_process.save(folder=base_folder / "preprocessed", verbose=True, **job_kwargs)

job_kwargs = dict(delete_tmp_files= False, n_jobs=-1, chunk_duration="1s", progress_bar= True)
sorter_params = job_kwargs

sorting_KS25 = ss.run_sorter('kilosort2_5', recording_saved,
                             output_folder=base_folder / 'results_KS25',
                             verbose=True, **sorter_params)

And here a print of the output folder:

I appreciate any help or suggestions in how to send the data to KS. Many thanks!

Aug 16 '23 09:08 florgf88

Hi @florgf88

Is the progress bar printed? Can you monitor the RAM and CPU usage?

Aug 21 '23 12:08 alejoe91

Hi @alejoe91, thanks for your replay!

When writing to binary gets stuck, it never prints the progress bar. Here is a screenshot monitoring RAM and CPU usage:

Another thing that I noticed is that if I run the script until saving the preprocessed data, but not together with the sorter steps, it might complete the operation (not sure if this is helpful). Would it be that the PC is not powerful enough? Processor: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz 3.79 GHz, RAM 32.0 GB (31.8 GB usable).

Many thanks!

Aug 21 '23 17:08 florgf88

Can you try the save with few jobs (4) and maybe also playing with mp_context="spawn".

Aug 28 '23 08:08 samuelgarcia

And also as a general comment : for enormous dataset you should use a linux server remotly and not a local machine. All the parralel processing is better on linux because the multiprocessing can use the fork which is faster.

Aug 28 '23 08:08 samuelgarcia

Hi Samuel, thanks a lot for your response! And many thanks for the advice regarding using a Linux server remotely, I will consider this option.

I have tried to run my script using fewer jobs and with the mp_context ="spawn" for saving. Unfortunately, this didn't work. With jobs=1, it managed to run the full script once but took a long time to complete. I have also tried to add a line at the end of the run function in job_tools, but this didn't solve the issue. When I run the script in 2 steps: 1) using all the cores for saving to binary, and 2) running Kilosort and compressing the data using mtscomp, everything works well. Any other suggestion that I could try? Many thanks!

    def run(self):
        """
        Runs the defined jobs.
        """
        all_chunks = divide_recording_into_chunks(self.recording, self.chunk_size)

        if self.handle_returns:
            returns = []
        else:
            returns = None

        if self.n_jobs == 1:
            if self.progress_bar:
                all_chunks = tqdm(all_chunks, ascii=True, desc=self.job_name)

            worker_ctx = self.init_func(*self.init_args)
            for segment_index, frame_start, frame_stop in all_chunks:
                res = self.func(segment_index, frame_start, frame_stop, worker_ctx)
                if self.handle_returns:
                    returns.append(res)
                if self.gather_func is not None:
                    self.gather_func(res)
        else:
            n_jobs = min(self.n_jobs, len(all_chunks))
            ######## Do you want to limit the number of threads per process?
            ######## It has to be done to speed up numpy a lot if multicores
            ######## Otherwise, np.dot will be slow. How to do that, up to you
            ######## This is just a suggestion, but here it adds a dependency

            # parallel
            with ProcessPoolExecutor(
                max_workers=n_jobs,
                initializer=worker_initializer,
                mp_context=mp.get_context(self.mp_context),
                initargs=(self.func, self.init_func, self.init_args, self.max_threads_per_process),
            ) as executor:
                results = executor.map(function_wrapper, all_chunks)

                if self.progress_bar:
                    results = tqdm(results, desc=self.job_name, total=len(all_chunks))

                for res in results:
                    if self.handle_returns:
                        returns.append(res)
                    if self.gather_func is not None:
                        self.gather_func(res)

                executor.shutdown(wait=True, cancel_futures=False) # line added by florgf88
        return returns

Aug 30 '23 11:08 florgf88

I want to bring this back up in our minds. I was helping someone (two people actually) set up SI in our lab (a little while a go--maybe December). And we could not get the write to binary to work. We would always hang right before the progress bar appears. We then took the exact same file and brought it over to my computer and were able to get it to write the binary file no problem. I tried tuning the job_kwargs extensively on their computers (Windows 10 and Windows 11) and nothing worked. They set it aside and worked on some other parts of their project and then last week we finally just deleted the conda environment and I installed everything from source (Neo, Probeinterface, Spikeinterface) and then it worked. So not sure if we had a bug for Windows in one of the older versions, but I just fixed this last week on someone's computer by doing a full source based install for them. (Weird that I've never run into this problem despite using multiple versions), but just to provide more data about this happening.

Mar 21 '24 12:03 zm711

thank for the feedback. So to summary with recent version of python + numpy + spikeinterface things are going better. isn't it ?

Mar 21 '24 13:03 samuelgarcia

yeah, python 3.11, numpy 1.26, and then the current dev of spikeinterface. But never figured out why it wasn't working (it's worked on mine with python 3.9,3.10, and, 3.11 + multiple numpy's and I think I started using si with 97 or 98).

Mar 21 '24 13:03 zm711

@zm711 I fixed most of the errors of the binary around those versions.

There is a memory leak on np.memmap (or the way we use them in conjuntion with multiprocessing) and it was solved by creating and destroying the memmap.

Here is the PR, the tag says it was added on version 0.98. https://github.com/SpikeInterface/spikeinterface/pull/1602

Mar 21 '24 16:03 h-mayorquin

@florgf88 any chance you could run this against with the latest version? Otherwise we should close this due to inactivity.

Mar 21 '24 16:03 h-mayorquin

Many thanks for bringing this up!

@h-mayorquin I found another solution at that time without SI, but I will try again and let you know the outcome.

Mar 21 '24 16:03 florgf88

Just to clarify @h-mayorquin, they would have installed spikeinterface in around December so that would be still present in version 0.99. I know you worked a lot on fixing this and maybe updating versions worked for things, but it seems the fix for them was going from version 0.99->.101 with the associated dependency updates. (For context this was for writing a binary to be used by KS).

Thanks @florgf88, it would be great to know if updating for you fixes the issue.

Mar 21 '24 16:03 zm711

Got you. I am very confused as to why this could be. The only components to write_binary is numpy's memmap and I don't think that has changed with versions or recently.

What format it was that you were trying to write? Most likely is the recording the one that did not deallocate the loaded traces. Did you try to do non-parallel writing? To avoid any os-specific race condition.

I guess we will never know. Anyway, telling people to do a full re-install should be one of the first things that we suggest so...

Mar 21 '24 17:03 h-mayorquin

We are going from intan's .rhd to recording.dat. The code for that: is https://github.com/SpikeInterface/spikeinterface/blob/52617ee93d532f8e8a6b4a675a7f4c1cca304873/src/spikeinterface/sorters/external/kilosortbase.py#L129-L160

this should then just write the binary recording. We tried it once (n_jobs=1), it was a 40gb file so it was being done over night and my labmate was too inpatient when she got back to work and so just canceled so we never saw if it finished or not. Like I said it has always worked on my computer which is weird (one of my labmates has the exact same model of computer as me).

So I agree I don't think we will ever know....

Anyway, telling people to do a full re-install should be one of the first things that we suggest so...

:) yep I guess so....

Mar 21 '24 17:03 zm711

Yeah, hard to venture guessess other than the memmaps of intan not getting released. On Windows the commited memory might act weird. Would have needed to be there.

Mar 21 '24 17:03 h-mayorquin

spikeinterface spikeinterface copied to clipboard

Issue saving large preprocessed recording to binary file

spikeinterface
spikeinterface copied to clipboard