shell-operator
shell-operator copied to clipboard
shell-operator executor hangs when hook is printing too long lines
Expected behavior (what you expected to happen):
Operator executor does not hang when hook application is printing too long lines. or The current limit of 64K line is documented in the project
Actual behavior (what actually happened):
When one of our hook prints out very a long line (>64K), the operator will hang. We are either forced to restart it or implement a timeout to force stop the hanging execution. But in either case we are not sure about what got executed and shell-operator is not providing information about what happened.
Steps to reproduce: Starting from example https://github.com/flant/shell-operator/tree/master/examples/001-startup-shell
- Create a 100K text file in hook/long.txt
- Modify hook/shell-hook.sh:
#!/usr/bin/env bash
if [[ $1 == "--config" ]] ; then
echo '{"configVersion":"v1", "onStartup": 1}'
else
echo "OnStartup shell hook BEGIN >>>"
cat long.txt
cat long.txt
cat long.txt
echo ">>> END OnStartup shell hook"
fi
- Build the Docker image of the example
docker build -t "registry.mycompany.com/shell-operator:startup-shell" .
- Run the Docker image
docker run registry.mycompany.com/shell-operator:startup-shell
=> RESULT: Last log line displayed is {"binding":"","event":"onStartup","hook":"shell-hook.sh","level":"info","msg":"OnStartup shell hook BEGIN","output":"stdout","queue":"","task":"HookRun","time":"2021-06-15T07:45:39Z"}
and execution will hang until you interrupt it.
Environment:
- Shell-operator version: latest
- Kubernetes version: not relevant as issue can be reproduced with local Docker image
- Installation type (kubectl apply, helm chart, etc.): not relevant as issue can be reproduced with local Docker image
Anything else we should know?:
The issue is about the current usage of Scanner in https://github.com/flant/shell-operator/blob/master/pkg/executor/executor.go:
https://golang.org/pkg/bufio/#pkg-constants shows that max line limit is 64K.
In our case, we are sending more data to the pipe so scanner.Scan()
fails and breaks the loop. After this we send data to a pipe which is not read anymore so it eventually saturates and makes the process hang.
Additional information for debugging (if necessary):
You can read the Scanner error after the scanner.scan for loop in https://github.com/flant/shell-operator/blob/master/pkg/executor/executor.go:
for scanner.Scan() {
stdoutLogEntry.Info(scanner.Text())
}
fmt.Println(scanner.Err())
This will print bufio.Scanner: token too long
error in logs
Hello! Nice catch! It seems a rare situation, so I'll add a warning to the documentation. Feel free to create a pull request if this problem is crucial for you.
Hello, it is not so rare, while exploring shell operator for the first time you will be wondering what's inside binding context and will try to print it somehow, and as a result operator will stop working
Also wondering if it will be possible to recognize json output from script and log it as json rather than msg string, which might make it little bit easier for consume