Use tempdir for output in run_quick.py to improve efficiency with network storage
Description
This PR addresses issue #[issue_number] by modifying run_quick.py to use a temporary directory for the workflow output, improving efficiency when using network drives for storage.
Changes
Previously, run_quick.py already used a temporary directory for creating the input BIDS dataset, but wrote workflow output directly to the final output directory. This could cause significant performance degradation when the output directory is on network storage (e.g., NFS, CIFS), as all intermediate files and workflow I/O would traverse the network.
Now, the workflow writes to a temporary output directory (a subdirectory within the same temp_dir used for input), and only the final subject results (hippunfold/sub-{subject}/) are copied to the final output location after successful completion.
Key Implementation Details
-
Temporary output directory: Created as
temp_dir/output/alongside the temporary input BIDS dataset - Workflow execution: Runs entirely in the temporary location (e.g., local disk)
-
Result copy: After successful completion, copies only
hippunfold/sub-{subject}/to the final output directory - Error handling: On workflow failure, no copy is performed, leaving the final output unchanged
- Overwrite behavior: If the subject directory already exists in the final output, it is replaced
Example
hippunfold-quick \
--input /data/subject.nii.gz \
--output /network/storage/results \
--subject 001 \
--modality T1w
Workflow execution (local disk):
/tmp/tmpXXXXX/
├── anat/sub-001/ # Input BIDS
│ └── sub-001_T1w.nii.gz
└── output/ # All workflow I/O happens here
├── hippunfold/sub-001/ # Final results
├── work/ # Intermediate files (not copied)
├── logs/ # Logs (not copied)
└── .snakemake/ # Metadata (not copied)
Final output (network storage):
/network/storage/results/
└── hippunfold/
└── sub-001/ # Only this gets copied back
├── anat/
├── surf/
├── coords/
└── qc/
Benefits
- Performance: All workflow I/O happens on local disk, avoiding network overhead
-
Efficiency: Only final results are copied to network storage, not intermediate files (
work/,logs/,.snakemake/) - Safety: Atomic updates - only successful runs update the final output
- Compatibility: Fully backward compatible, no changes to CLI arguments or behavior
Testing
- Code formatted with
blackandisort - Syntax validation passed
- Manual testing of copy logic verified
- No breaking changes to existing functionality
Original prompt
This section details on the original issue you should resolve
<issue_title>Use tempdir for writing output dir in run_quick.py</issue_title> <issue_description>Using a tempdir (eg local disk) for writing to the output dir can improve efficiency when using network drives for storage otherwise. The run_quick.py console script already uses a tempdir to create the input bids dataset, we should also use a sub-directory in that folder as the hippunfold output dir, so the workflow gets run from there. Would need to then add a command to copy back from there to the final output dir (copying only the sub-{subject} dir) after successful completion.</issue_description>
Comments on the Issue (you are @copilot in this section)
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.