Use tempdir for output in run_quick.py to improve efficiency with network storage

Open Copilot opened this issue 5 months ago • 0 comments

Description

This PR addresses issue #[issue_number] by modifying run_quick.py to use a temporary directory for the workflow output, improving efficiency when using network drives for storage.

Changes

Previously, run_quick.py already used a temporary directory for creating the input BIDS dataset, but wrote workflow output directly to the final output directory. This could cause significant performance degradation when the output directory is on network storage (e.g., NFS, CIFS), as all intermediate files and workflow I/O would traverse the network.

Now, the workflow writes to a temporary output directory (a subdirectory within the same temp_dir used for input), and only the final subject results (hippunfold/sub-{subject}/) are copied to the final output location after successful completion.

Key Implementation Details

Temporary output directory: Created as temp_dir/output/ alongside the temporary input BIDS dataset
Workflow execution: Runs entirely in the temporary location (e.g., local disk)
Result copy: After successful completion, copies only hippunfold/sub-{subject}/ to the final output directory
Error handling: On workflow failure, no copy is performed, leaving the final output unchanged
Overwrite behavior: If the subject directory already exists in the final output, it is replaced

Example

hippunfold-quick \
  --input /data/subject.nii.gz \
  --output /network/storage/results \
  --subject 001 \
  --modality T1w

Workflow execution (local disk):

/tmp/tmpXXXXX/
├── anat/sub-001/              # Input BIDS
│   └── sub-001_T1w.nii.gz
└── output/                    # All workflow I/O happens here
    ├── hippunfold/sub-001/    # Final results
    ├── work/                  # Intermediate files (not copied)
    ├── logs/                  # Logs (not copied)
    └── .snakemake/            # Metadata (not copied)

Final output (network storage):

/network/storage/results/
└── hippunfold/
    └── sub-001/               # Only this gets copied back
        ├── anat/
        ├── surf/
        ├── coords/
        └── qc/

Benefits

Performance: All workflow I/O happens on local disk, avoiding network overhead
Efficiency: Only final results are copied to network storage, not intermediate files (work/, logs/, .snakemake/)
Safety: Atomic updates - only successful runs update the final output
Compatibility: Fully backward compatible, no changes to CLI arguments or behavior

Testing

Code formatted with black and isort
Syntax validation passed
Manual testing of copy logic verified
No breaking changes to existing functionality

Original prompt

This section details on the original issue you should resolve

<issue_title>Use tempdir for writing output dir in run_quick.py</issue_title> <issue_description>Using a tempdir (eg local disk) for writing to the output dir can improve efficiency when using network drives for storage otherwise. The run_quick.py console script already uses a tempdir to create the input bids dataset, we should also use a sub-directory in that folder as the hippunfold output dir, so the workflow gets run from there. Would need to then add a command to copy back from there to the final output dir (copying only the sub-{subject} dir) after successful completion.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes khanlab/hippunfold#518

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Oct 01 '25 14:10 Copilot