cmd icon indicating copy to clipboard operation
cmd copied to clipboard

[streaming] Line breaks and buffer overflows (broken pipes)

Open 13k opened this issue 1 year ago • 1 comments

This is similar to #4, but it's more related to child process output content rather than output/buffer sizes.

Child processes that output a lot of data, or run long enough, but don't emit line breaks will eventually cause buffer overflows and therefore broken pipe errors.

This is easily reproducible:

package main

import (
	"fmt"
	"log"
	"math/rand/v2"
	"os"
	"time"

	"github.com/go-cmd/cmd"
)

func main() {
	if len(os.Args) > 1 {
		server()
	} else {
		client()
	}
}

func server() {
	ttl := time.Second * 10
	timeout := time.After(ttl)

	fmt.Printf("[server] start (running for %s)\n", ttl)

	for {
		n := time.Duration(rand.Int64N(10))
		step := time.After(time.Millisecond * n)

		select {
		case <-step:
			fmt.Printf("\033[2K\r[server] %d", n)
		case <-timeout:
			fmt.Println("\033[2K\r[server] exit")
			return
		}
	}

}

func client() {
	selfExe, err := os.Executable()

	if err != nil {
		panic(err)
	}

	svCmd := []string{selfExe, "server"}

	log.Printf("[client] launching server: %q", svCmd)

	cmd := cmd.NewCmdOptions(cmd.Options{Streaming: true}, selfExe, "server")

	go func() {
		for output := range cmd.Stdout {
			fmt.Fprintln(os.Stdout, output)
		}
	}()

	go func() {
		for output := range cmd.Stderr {
			fmt.Fprintln(os.Stderr, output)
		}
	}()

	log.Println("[client] wait server")

	status := <-cmd.Start()

	if status.Error != nil {
		log.Fatal(status.Error)
	}

	log.Printf("[client] server status: %v", status)
	log.Println("[client] exit")
}

You can notice that in each server loop iteration, it doesn't emit a line break to the output, it uses a ANSI control sequence (\033[2K) to clear the line and return the cursor to the start of the line (\r).

go-cmd will keep accumulating output, because it doesn't see a \n and will never send a message to the output channel. This causes two issues.

The overflow issue

Eventually go-cmd's internal buffer overflows:

> go run streaming.go
2024/05/31 14:17:18 [client] launching server: ["/tmp/go-build653253379/b001/exe/streaming" "server"]
2024/05/31 14:17:18 [client] wait server
[server] start (running for 10s)
2024/05/31 14:17:24 signal: broken pipe
exit status 1

I believe there are no buffer size adjustments that could solve this.

The issue with buffering based on assumptions about the data format

go-cmd assumes the output will be line oriented and sends channel messages for each line.

I imagine this is done for convenience, but it breaks streaming flows that could be important, like showing progress status (e.g.: rclone, aria2c, wget, curl, etc).

Even for simpler cases (ANSI control sequences are not used) or smaller cases (all of the output would fit in the buffer), for example when a process prints progress on the same line (like a series of dots or percentages), go-cmd's implementation breaks those flows (e.g.: some mkfs implementations do this if I remember correctly).

Another broken use case would be processes that emit very long lines. It could be the case for minifiers/uglyfiers/obfuscators, or language servers (LSPs) that are known to emit long JSON lines to stdio, for example. go-cmd would have to keep increasing buffer size to accomodate for this.

Proposed solution

I believe the solution to both issues is to just do byte oriented streaming with buffering: don't make assumptions about data format and emit chunks downstream whenever the buffer is filled. The client is responsible to reassemble the data if needed.

13k avatar May 31 '24 17:05 13k