spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

run_sorter in container fails on Windows when recording is not on OS drive (D:) — in_container_sorting path malformed

Open SpencerBowles opened this issue 7 months ago • 14 comments

Summary

We encountered a failure when using spikeinterface.sorters.run_sorter(..., docker_image=...) on Windows 11, when the recording is stored on a secondary drive (D:). The sorter completes, and the container stops, but the loading of the result fails because the path to the serialized recording inside in_container_sorting is malformed — it lacks a drive letter and begins with a backslash.

I have tried overwriting the recording object path kwarg with a POSIX version of the path and it does not change the error.

This issue does not occur:

  • When running the same call without Docker (in Conda)
  • When the recording path is on the OS drive (C:)

System Info

  • Windows 11
  • SpikeInterface commit 9d6ad5b29
  • Docker image: 'datajoint-spikeinterface:latest` (custom-built)
  • Base image: NVIDIA CUDA image (nvidia/cuda:11.7.1)
  • SpikeInterface installed from local copy

Here is the command as I run it: sorting = run_sorter(sorter_name=sorter_name["name"], recording=recording, folder=output_path, installation_mode="folder", spikeinterface_folder_source=code_src[0], remove_existing_folder=True, verbose=True, docker_image=docker_image[0])

Error Trace: ValueError: D:\NP_sorted_backup\...\cleaning\sorting\in_container_sorting is not a file or a folder. It should point to either a json, pickle file or a folder that is the result of extractor.save(...) Inners tack trace: FileNotFoundError: [Errno 2] No such file or directory: '\\NP_sorted_backup\\...\\cleaning\\binary.json'

I notice that the path structure is missing the drive information.

Here is the full error:

→ Populating SortingCompute
2025-03-27 12:03:20,918::INFO::sorting.py::Populating SortingCompute for ['Flea_2024-08-13_1'], key: {'recording_id': 1, 'doe': datetime.datetime(2024, 8, 13, 0, 0), 'attempt': 1, 'probe_id': 1, 'filtered': 1, 'clean_path': 'D:\\NP_sorted_backup\\2024-08-13_13-44-43_Flea_adaptation_implicit\\Record Node 101\\experiment1\\recording1\\continuous\\Neuropix-PXI-100.ProbeA\\cleaning', 'id': 1}
Fixed folder_path: D:/NP_sorted_backup/2024-08-13_13-44-43_Flea_adaptation_implicit/Record Node 101/experiment1/recording1/continuous/Neuropix-PXI-100.ProbeA/cleaning
True
2025-03-27 12:03:20,929::INFO::sorting.py::Saving new sorted recording to D:\NP_sorted_backup\2024-08-13_13-44-43_Flea_adaptation_implicit\Record Node 101\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA\cleaning\sorting.
Starting container
Running kilosort4 sorter inside spencer/datajoint-spikeinterface:latest
Stopping container
2025-03-27 22:02:19,578::WARNING::sorting.py::Unexpected error in the recording processing pipeline: D:\NP_sorted_backup\2024-08-13_13-44-43_Flea_adaptation_implicit\Record Node 101\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA\cleaning\sorting\in_container_sorting is not a file or a folder. It should point to either a json, pickle file or a folder that is the result of extractor.save(...)
Traceback (most recent call last):
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\sorters\runsorter.py", line 667, in run_sorter_container
    sorting = SorterClass.get_result_from_folder(folder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\sorters\basesorter.py", line 334, in get_result_from_folder
    recording = cls.load_recording_from_folder(output_folder, with_warnings=False)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\sorters\basesorter.py", line 212, in load_recording_from_folder
    recording = load_extractor(json_file, base_folder=output_folder)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\base.py", line 1184, in load_extractor
    return BaseExtractor.load(file_or_folder_or_dict, base_folder=base_folder)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\base.py", line 769, in load
    extractor = BaseExtractor.from_dict(d, base_folder=base_folder)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\base.py", line 515, in from_dict
    extractor = _load_extractor_from_dict(dictionary)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\base.py", line 1123, in _load_extractor_from_dict
    extractor = extractor_class(**new_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\binaryfolder.py", line 31, in __init__
    with open(folder_path / "binary.json", "r") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '\\NP_sorted_backup\\2024-08-13_13-44-43_Flea_adaptation_implicit\\Record Node 101\\experiment1\\recording1\\continuous\\Neuropix-PXI-100.ProbeA\\cleaning\\binary.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\mathi\github\auxPipelines-DataJoint_Mathis\neuropixels\neuropixels_schemas\np_pipeline\schemas\sorting.py", line 146, in make
    sorting = run_sorter(
              ^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\sorters\runsorter.py", line 210, in run_sorter
    return run_sorter_container(
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\sorters\runsorter.py", line 670, in run_sorter_container
    sorting = load_extractor(in_container_sorting_folder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\base.py", line 1184, in load_extractor
    return BaseExtractor.load(file_or_folder_or_dict, base_folder=base_folder)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mathi\anaconda3\envs\si_env_rolling\Lib\site-packages\spikeinterface\core\base.py", line 805, in load
    raise ValueError(error_msg)
ValueError: D:\NP_sorted_backup\2024-08-13_13-44-43_Flea_adaptation_implicit\Record Node 101\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA\cleaning\sorting\in_container_sorting is not a file or a folder. It should point to either a json, pickle file or a folder that is the result of extractor.save(...)

SpencerBowles avatar Mar 31 '25 15:03 SpencerBowles

I believe we've been able to use docker from different drives, but it's been a while since we've tried. One easy question to start with... what is the exact value of output_folder that you are putting into the run_sorter?

zm711 avatar Mar 31 '25 15:03 zm711

I have the feeling that we handle this but I do not remember. Is your case : the input (raw recordin) is on a different drive than the output (sorter folder) ?

samuelgarcia avatar Apr 01 '25 11:04 samuelgarcia

I believe we've been able to use docker from different drives, but it's been a while since we've tried. One easy question to start with... what is the exact value of output_folder that you are putting into the run_sorter?

The output folder is a subdirectory in the same directory as the recording object. I think that it is being located fine because the actual output of the sorting is saved there correctly in full. It is just the in_containter_sorting folder that is missing. But the output path would look like this: [36]: output_path Out[36]: WindowsPath('D:/NP_Raw_backup/2024-12-12_13-56-16_Fossa_adaptation_implicit/Record Node 101/experiment1/recording1/continuous/Neuropix-PXI-107.ProbeD/cleaning/sorting')

SpencerBowles avatar Apr 01 '25 11:04 SpencerBowles

I have the feeling that we handle this but I do not remember. Is your case : the input (raw recordin) is on a different drive than the output (sorter folder) ?

No, our target directory is a subdirectory of the folder that holds the recording object.

SpencerBowles avatar Apr 01 '25 11:04 SpencerBowles

One other question that's Windows specific. Do you have long paths enabled on your computer. Windows does this think where they limit how nested files can be. So if you have the default (unless they've changed) you might be too nested.

I would recommend two things:

  1. Try a local sorter (even if you want to do a tiny test with SC2 or TDC2--those come built into spikeinterface)
  2. Try to reduce the nesting of your folders as a test
  3. bonus: read up about enabling longer paths on Windows and then check on your computer (I had to this for my workstation :) )

zm711 avatar Apr 01 '25 12:04 zm711

One other question that's Windows specific. Do you have long paths enabled on your computer. Windows does this think where they limit how nested files can be. So if you have the default (unless they've changed) you might be too nested.

I would recommend two things:

  1. Try a local sorter (even if you want to do a tiny test with SC2 or TDC2--those come built into spikeinterface)
  2. Try to reduce the nesting of your folders as a test
  3. bonus: read up about enabling longer paths on Windows and then check on your computer (I had to this for my workstation :) )

I do not think that total path length is an issue here, as spikeinterface can interact with even deeper nested files and directories when we create the sorting analyzer.

I believe this is specifically an issue with these lines in run_sorter link.

    # find input folder of recording for folder bind
    rec_dict = recording.to_dict(recursive=True)
    recording_input_folders = find_recording_folders(rec_dict)

    if platform.system() == "Windows":
        rec_dict = windows_extractor_dict_to_unix(rec_dict)

I followed back the windows_extractor_dict_to_unix function and pulled all of the helper functions it calls. When I run them in isolation on a windows recording object, they rewrite the path kwargs without the drive information.

SpencerBowles avatar Apr 01 '25 13:04 SpencerBowles

If you want to test this I wrote a script that just runs this function, and then prints the path in the recording object.

If you save this code as a python script and run it in any environment that has spikeinterface installed, you should be able to see if they behavior is the same. Just pass it a path to a correctly formatted recording object.

import platform
from pathlib import Path
from copy import deepcopy
from spikeinterface import load_extractor

# === SpikeInterface-derived helper functions ===


def path_to_unix(path):
    """Convert a Windows path to unix format"""
    path = Path(path)
    if platform.system() == "Windows":
        path = Path(str(path)[str(path).find(":") + 1 :])
    return path.as_posix()


def is_dict_extractor(d: dict) -> bool:
    """Check if a dict describes an extractor"""
    if not isinstance(d, dict):
        return False
    return all(k in d for k in ("module", "class", "version", "annotations"))


def recursive_path_modifier(d, func, target="path", copy=True):
    if copy:
        dc = deepcopy(d)
    else:
        dc = d

    if "kwargs" in dc:
        kwargs = dc["kwargs"]
        recursive_path_modifier(kwargs, func, copy=False)

        for k, v in kwargs.items():
            if isinstance(v, dict) and is_dict_extractor(v):
                recursive_path_modifier(v, func, copy=False)
            elif isinstance(v, list):
                for vl in v:
                    if isinstance(vl, dict) and is_dict_extractor(vl):
                        recursive_path_modifier(vl, func, copy=False)

        return dc
    else:
        for k, v in d.items():
            if target in k:
                if v is None:
                    continue
                if isinstance(v, (str, Path)):
                    dc[k] = func(v)
                elif isinstance(v, list):
                    dc[k] = [func(e) for e in v]
                else:
                    raise ValueError(f"{k} key for path must be str or list[str]")
        return dc


def windows_extractor_dict_to_unix(d):
    return recursive_path_modifier(d, path_to_unix, target="path", copy=True)


def print_paths_from_kwargs(d, prefix=""):
    if not isinstance(d, dict):
        return
    for k, v in d.items():
        key_path = f"{prefix}.{k}" if prefix else k
        if "path" in k.lower():
            print(f"{key_path}: {v}")
        if isinstance(v, dict):
            print_paths_from_kwargs(v, key_path)
        elif isinstance(v, list):
            for i, item in enumerate(v):
                if isinstance(item, dict):
                    print_paths_from_kwargs(item, f"{key_path}[{i}]")


# === Load and convert a recording object ===

if __name__ == "__main__":
    import sys

    if len(sys.argv) != 2:
        print("Usage: python test_si_paths.py <clean_path>")
        sys.exit(1)

    clean_path = sys.argv[1]
    recording = load_extractor(clean_path)
    rec_dict = recording.to_dict(recursive=True)

    print("➤ BEFORE CONVERSION:")
    print_paths_from_kwargs(rec_dict)

    if platform.system() == "Windows":
        rec_dict = windows_extractor_dict_to_unix(rec_dict)

    print("\n➤ AFTER CONVERSION:")
    print_paths_from_kwargs(rec_dict)

SpencerBowles avatar Apr 01 '25 13:04 SpencerBowles

When I try this I get a result like this:

(si_env_rolling) C:\Users\mathi\github\auxPipelines-DataJoint_Mathis\neuropixels\neuropixels_schemas>python test_si_paths.py "D:\NP_sorted_backup\2024-08-13_13-44-43_Flea_adaptation_implicit\Record Node 101\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA\cleaning"
➤ BEFORE CONVERSION:
kwargs.folder_path: D:\NP_sorted_backup\2024-08-13_13-44-43_Flea_adaptation_implicit\Record Node 101\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA\cleaning
relative_paths: False

➤ AFTER CONVERSION:
kwargs.folder_path: /NP_sorted_backup/2024-08-13_13-44-43_Flea_adaptation_implicit/Record Node 101/experiment1/recording1/continuous/Neuropix-PXI-100.ProbeA/cleaning
relative_paths: False

This matches the traceback error I see when I try using run_sorter.

FileNotFoundError: [Errno 2] No such file or directory: '\\NP_sorted_backup\\2024-08-13_13-44-43_Flea_adaptation_implicit\\Record Node 101\\experiment1\\recording1\\continuous\\Neuropix-PXI-100.ProbeA\\cleaning\\binary.json'

SpencerBowles avatar Apr 01 '25 13:04 SpencerBowles

Sorry and if you do a local sorter does the same issue happen? Like Sam said it used to be impossible to have different drives, but we had fixed that. (I routinely sort across c drive and a network mapped drive) and I've sorted across c and d. But I personally do it locally. Could you try a local sorter and see if you get the same error. Then we can narrow down more to docker issues, which I think it could be. Because I would bet the docker doesn't have an associated drive.

I'll test your script and see what it does on my machine!

zm711 avatar Apr 01 '25 14:04 zm711

Per the issue summary above "This issue does not occur:

  • When running the same call without Docker (in Conda)
  • When the recording path is on the OS drive (C:)"

SpencerBowles avatar Apr 01 '25 14:04 SpencerBowles

It really does seem that this drive deletion is by design though. Its even in the descriptive comments

SpencerBowles avatar Apr 01 '25 14:04 SpencerBowles

Hey all, just checking the status of this -- anything more we can provide? @zm711 cc @SpencerBowles ? 🙏🏼

MMathisLab avatar Apr 11 '25 17:04 MMathisLab

It really does seem that this drive deletion is by design though. Its even in the descriptive comments

Hi @SpencerBowles this is only to map the original parent recording (with the drive info) to a unix-like path to use in the container.

I'll look into this next week. Sorry about the delay on this!

alejoe91 avatar Apr 12 '25 08:04 alejoe91

@SpencerBowles can you test this in the meanwhile

  • keep the recording path on the D: drive
  • specify the run_sorter folder in the C: drive

I think this should work and it would save you from moving large files around..you can then just save the sorting object to D: in a following call:

sorting = `ss.run_sorter(..., folder="C:\...", docker_image=True)`
sorting.save(folder="D:\...")

alejoe91 avatar Apr 12 '25 08:04 alejoe91