nextflow
nextflow copied to clipboard
Ability to pipe stdout -> stdin between processes
This suggestion / request has come up several times, so wanted to collect the latest thread into a GitHub issue ('cc @nh13 @kmhernan @mahesh-panchal @adamrtalbot @muffato )
Unix pipes are a powerful way to stream data from one process to another, without needing to write intermediate data to disk. Currently this is not possible to do between processes in Nextflow. Current practice is either to write intermediate files, or to put multiple tools into a single process and pipe within that single script block.
Support in Nextflow is not a new request, but is technically challenging for several reasons:
- Making this work with distributed (cloud) clusters
- May be possible with task batching? https://github.com/nextflow-io/nextflow/pull/3909
- Figuring out how publishing + retry would work
Piping output between containers does work:
Singularity
singularity exec img1.sif cmd1 | singularity exec img2.sif cmd2
Docker
docker run ubuntu printf "line1\nline2\n" | docker run -i ubuntu grep line2 | docker run -i ubuntu sed 's/line2/line3/g'
@mahesh-panchal has written a minimal example demo using named pipes:
And this was the demo I wrote for using named pipes. The issue there was clean up and potential process deadlock. It would likely work with containers too though: So just for completion, one can send a pipe, but cleaning up ( i.e. removing the pipe afterwards is not simple because of the working directory isolation).
workflow {
MKFIFO()
SENDER( params.message, MKFIFO.out.pipe )
RECEIVER ( MKFIFO.out.pipe ) | view
}
process MKFIFO {
script:
"""
mkfifo mypipe
"""
output:
path "mypipe", emit: pipe
}
process SENDER {
input:
val message
path pipename
script:
"""
echo $message > $pipename
"""
output:
path pipename
}
process RECEIVER {
input:
path pipename
script:
"""
cat $pipename
"""
output:
stdout
}
And this is bad practice as it could easily lead to a process deadlock (edited) The reason for this structure is because named pipes block until they're read from ( i.e stop the process from completing )