script
script copied to clipboard
Parallel execution for ExecForEach and EachLine
Hi, this issue follows the discussion in #86. Up to now, the script runs in a fully synchronous manner. While in #34 and #59 people have come up with some brilliant ideas in asynchronous pipeline streaming, the designs and implementations just seem too complicated, as @bitfield mentioned.
Here, I want to suggest a compromise -- adding methods EachLineConc()
and ExecForEachConc()
, which should have the same input and output interface as EachLine()
and ExecForEach()
, but enable parallel execution within the method. (E.g., script.Slice(make([]string, 5)).ExecForEachConc("sleep 1")
should return in about 1 second, rather than 5 seconds.)
The ExecForEachConc()
method shares the same use cases with GNU parallel and the &
symbol in bash, e.g. :
https://blogs.sas.com/content/sgf/2021/04/14/using-shell-scripts-for-massively-parallel-processing/
https://unix.stackexchange.com/questions/103920/parallelize-a-bash-for-loop/103922
https://askubuntu.com/questions/431478/decompressing-multiple-files-at-once
https://superuser.com/questions/538164/how-many-instances-of-ffmpeg-commands-can-i-run-in-parallel/547340#547340
Apart from improving efficiency, I sometimes want to run programs concurrently for testing purposes. For example, two weeks ago, I just wrote a file transmission service, and I would like to test whether the receiver application behaves correctly when multiple files are sent to it at the same time. It would be nice if I could run something like:
script.ListFiles("FILE*.in").ExecForEachConc("./wSender --ip 10.0.0.1 --port 8888 --file {{.}}").Stdout()
Note: it seems that we cannot directly append & to the commands in ExecForEach()
for this purpose.
script.Slice(make([]string, 5)).ExecForEach("sleep 1 &")
returns instantly, because there is no 'wait'.
Also, if I change the argument to "bash -c 'sleep 1' &" or "bash -c 'sleep 1 &'", the program will still run for 5 seconds.
The implementation won't be too hard. We just need to rewrite EachLineConc()
, and let ExecForEachConc()
call this new method.
Here's an implementation that I first think of (which can for sure be further optimized):
func (p *Pipe) EachLineConc(process func(string, *strings.Builder)) *Pipe {
if p == nil || p.Error() != nil {
return p
}
scanner := bufio.NewScanner(p.Reader)
inputs := []string{}
for scanner.Scan() {
inputs = append(inputs, scanner.Text())
}
err := scanner.Err()
if err != nil {
p.SetError(err)
return p
}
lineNum := len(inputs)
outputs := make([]string, lineNum)
latch := sync.WaitGroup{}
latch.Add(lineNum)
for index, input := range inputs {
go func(index int, input string) {
output := strings.Builder{}
process(input, &output)
outputs[index] = output.String()
latch.Done()
}(index, input)
}
latch.Wait()
if p.Error() != nil {
return p
}
return Echo(strings.Join(outputs, ""))
}
Great! Let's see if this issue gets some traction with others who want to write concurrent scripts, and see what they think of the proposed API.
Hi, just found this project today and very much love the idea! About this discussion, I immediately thought about this going through the README. Plus for this one.
Great! Do you want to try and come up with a real-world program that would use these constructs?
Love the ability this provides.
Plus one for this idea.
Can you suggest an example where this might be useful, @tjayrush? Ideally, write a script
program using this construct that solves a user problem.
It's obviously useful. Every application I can think of where something can be done concurrently is useful even if only because it's way faster. Use your imagination.
I understand perhaps you're feeling frustrated, @tjayrush, and for all I know you're having a bad day for reasons unrelated to this issue or this project. But the tone of your comment is ill-judged, I think. I invite you to reflect on it and consider whether it's the sort of comment you'd like to receive from a contributor to one of your own projects, or even from a co-worker.