There might be a bug during the folder creation process
Hello, whenever I use the run_sorter() method with the options 'mountainsort5', 'spykingcircus2', 'tridesclous2', or 'herdingspikes', I consistently encounter the error message 'Folder xxx_output already exists'. I noticed that the path parameter in run_sorter() is set to the default value, and there are no existing files in the specified path. It appears that there might be a bug occurring during the folder creation process. Would you mind taking a look at this issue and possibly resolving it? Thank you for your assistance!
Hi @daisy-zsn
Could you provide a script that reproduces the issue? I personally haven't encountered this.
What do you mean with "specified path"? Is it the current folder? By default the run sorter will create a folder output_{sorter_name} in the current folder, unless you specify a different output_folder. So if you run the same script twice it will (correctly) trigger this error. You can use the remove_existing_folder=True to overwrite the output folder.
I have a theory, but I don't know much about these sorters, so you'll have to tell me if it makes sense @alejoe91 and @samuelgarcia .
If you call recording.save(format="zarr", ...) (or similar) with n_jobs > 1, and your default multiprocessing context is not "fork", then the child processes don't inherit the parent's memory directly. They start a fresh Python interpreter, which means the module you're running is re-imported. If this module doesn't have an if __name__ == "__main__": guard, the entire script, including recording.save() executes again in each child process, which causes the assertion error (because the output directory was already created by the parent/main process).
I just ran into this because I upgraded from Python 3.13 to Python 3.14. On my platform (Linux), Python 3.14 changed the default multiprocessing context from "fork" to "forkserver", uncovering this race condition. So all of a sudden, the same script with the same version of zarr (v2.18.7) and the same version of spikeinterface (v0.103.3) started producing this assertion error.
Thankfully it's an easy fix: use if __name__ == "__main__": and wrap your code in a function, for example:
import spikeinterface as si
def main():
recording = si.read_spikeglx(...)
# Execute pipeline
...
recording.save(format="zarr", ...)
if __name__ == "__main__":
main()
You could also not do this and set mp_context="fork" in the call to save, but that's not recommended. You should use the if __name__ == "__main__": guard, regardless of which mp_context you use.
It was a nightmare to figure out and debug. I wouldn't be surprised if you more of these issues as adoption of 3.14 picks up.
I just want to clarify that this is NOT an issue with spikeinterface, it is just a user error that is really easy to make if you're copy-pasting code snippets from a tutorial or how-to guide into a my_pipeline.py and executing that.
We should definitely add this to the docs!!!
yes agree we should be very clear to use the if __name__ == "__main__": in script using spikeinterface because of the multiprocessing