notes icon indicating copy to clipboard operation
notes copied to clipboard

Go IO Cookbook: code review

Open peterGo opened this issue 5 years ago • 6 comments

Nice job. I performed a Go code review: correct, maintainable, reasonably efficient, and readable. Here are some suggestions for your directly using io.Reader code.

// Directly using io.Reader

package main

import (
	"fmt"
	"io"
	"log"
	"strings"
)

func main() {
	rdr := strings.NewReader("this is the stuff I'm reading")
	var data []byte
	buf := make([]byte, 0, 4)
	for {
		n, err := rdr.Read(buf[:cap(buf)])
		buf = buf[:n]

		data = append(data, buf...)

		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}
	}
	fmt.Println(string(data))
}

Playground: https://play.golang.org/p/k1O3Qf5K39k

peterGo avatar Jan 06 '20 19:01 peterGo

Thanks :)

I have a couple of questions:

  1. Why is it preferable to use the capacity of the `buf variable rather than length when instantiating it? Go will zero-out the values anyway won't it?
  2. If an error occurs mid-read and there are some bytes written to the buffer, do we necessarily want to still append those to our data variable? My original code was a little bit more verbose so that it didn't append to data in the event of a non-EOF error

jesseduffield avatar Jan 08 '20 04:01 jesseduffield

Q1. Why is it preferable to use the capacity of the buf variable rather than length when instantiating it? Go will zero-out the values anyway won't it?

We don't want to waste huge amounts of time testing and debugging, so we write provably correct and maintainable (which implies readable) programs that scale.

The buf byte slice variable contains the most recently read bytes. The number of bytes read is len(buf).

The statement

buf := make([]byte, 4)

is equivalent to

buf := make([]byte, 4, 4)

It has len(buf) = 4 and cap(buf) = 4, and the underlying byte array is initialized to zero (the byte zero-value). After initialization, the number of bytes read is zero, not 4, it's not correct. The correct buf initialization statement is

buf := make([]byte, 0, 4)

It has len(buf) = 0 and cap(buf) = 4, and the underlying byte array is initialized to zero (the byte zero-value).

Consider the code after maintenance by different programmers has added many more lines of code, including code between buf initialization and the buf Read statement.

If we have a similar requirement elsewhere, is this code robust enough to copy? Is it a good code pattern?

Q2. If an error occurs mid-read and there are some bytes written to the buffer, do we necessarily want to still append those to our data variable? My original code was a little bit more verbose so that it didn't append to data in the event of a non-EOF error.

Yes, always. The documentation for type io.Reader:

Package io

type Reader

Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.

In real programs this would be encapsulated in a function. Return all the data and the error. Allow the function caller to decide.

Suggestion: As a suggestion, consider encapsulating your examples in functions and methods for more realistic, less artificial Go code: io.Readers as function parameters; returning errors instead of panicing; returning nil instead of io.EOF; allowing the caller to handle errors; and so on.

For example,

// Directly using io.Reader

package main

import (
    "fmt"
    "io"
    "log"
    "strings"
)

func readAll(rdr io.Reader) ([]byte, error) {
    var data []byte
    buf := make([]byte, 0, 4)
    for {
	    n, err := rdr.Read(buf[:cap(buf)])
	    buf = buf[:n]

	    data = append(data, buf...)

	    if err != nil {
		    if err == io.EOF {
			    break
		    }
		    return data, err
	    }
    }
    return data, nil
}

func main() {
    rdr := strings.NewReader("this is the stuff I'm reading")
    data, err := readAll(rdr)
    if err != nil {
	    log.Fatal(err)
    }
    fmt.Println(string(data))
}

peterGo avatar Jan 08 '20 19:01 peterGo

Thanks for your detailed explanation! I'll update the example to have the updated array instantiation and correct control flow, however for now I'm going to leave it in the one main() func as one of my goals was brevity when writing this code, and I want to keep the examples as compact as possible for explanatory purposes (though I agree that your example better reflects real-world usage)

I'll also have a read through those docs. Thanks :)

jesseduffield avatar Jan 09 '20 07:01 jesseduffield

I've done some digging and it looks like the capacity approach doesn't actually help us. When we call buf[:cap(buf)] (checked via fmt.Println(spew.Sdump(buf[:cap(buf)]))) it looks like we're zeroing-out that array one way or another.

Your argument about considering when other developers add more lines inbetween doesn't consider that even if you set buffer to have zero length, inbetween we can still do things that increase the capacity, meaning buf[:cap(buf)] won't defend us against issues there. What are your thoughts?

jesseduffield avatar Jan 09 '20 07:01 jesseduffield

Q. Your argument about considering when other developers add more lines inbetween doesn't consider that even if you set buffer to have zero length, inbetween we can still do things that increase the capacity, meaning buf[:cap(buf)] won't defend us against issues there. What are your thoughts?

By design, the code considers all of that and takes it into account.

For provably a correct function, we establish an invariant whose meaning doesn't change during execution.

// buf contains the most recently read len(buf) bytes 
buf := make([]byte, 0, 4)
// ... buf is read-only
for {
    // ... buf is read-only
    n, err := rdr.Read(buf[:cap(buf)])
    buf = buf[:n]
    // ... buf is read-only
}
// ... buf is read-only

where

n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]

is a condensed version of

p := buf[:cap(buf)]
n, err := rdr.Read(p)
buf = buf[:n]

In Go, arguments are passed by value, as if by assignment.

For each Read, p := buf[:cap(buf)] sets len(p) to the maximum buf capacity.

After n, err := rdr.Read(buf), where 0 <= n <= len(buf), buf = buf[:n] immediately reinstates the len(buf) invariant.

It is simple and easy to inspect all other references to buf, visually and/or by tool, to verify that they are read-only. For example,

data = append(data, buf...)

is read-only, and

buf = buf[:cap(buf)]

is not read-only.

QED

If we encounter code that modifies buf in a buf read-only section of code, it must be rejected during code review, if not before.

peterGo avatar Jan 11 '20 03:01 peterGo

Q. I've done some digging and it looks like the capacity approach doesn't actually help us. When we call buf[:cap(buf)] (checked via fmt.Println(spew.Sdump(buf[:cap(buf)]))) it looks like we're zeroing-out that array one way or another.

I don't understand what you are saying. The capacity approach is essential. What do you mean by "When we call buf[:cap(buf)]?" buf[:cap(buf)] is a slice expression, not a function call. What do you mean by "we're zeroing-out that array one way or another" and why is that relevant? Where is the code to reproduce your fmt.Println(spew.Sdump(buf[:cap(buf)])) check and what does the output show?

The Go documentation:

Package io https://golang.org/pkg/io/ type Reader https://golang.org/pkg/io/#Reader

Reader is the interface that wraps the basic Read method.

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)).

type Reader interface {
    Read(p []byte) (n int, err error)
}

There is no guarantee that any particular Read will read len(p) bytes.

// Directly using io.Reader

package main

import (
    "fmt"
    "io"
    "log"
    "strings"
)

func main() {
    rdr := strings.NewReader("this is the stuff I'm reading")
    var data []byte
    buf := make([]byte, 0, 4)
    for {
	    n, err := rdr.Read(buf[:cap(buf)])
	    buf = buf[:n]

	    data = append(data, buf...)

	    if err != nil {
		    if err == io.EOF {
			    break
		    }
		    log.Fatal(err)
	    }
    }
    fmt.Println(string(data))
}

After the first Read, Read(buf) reuses buf with the previous value of n for len(buf),

n, err := rdr.Read(buf)
buf = buf[:n]

where

0 <= n <= len(buf) <= cap(buf)

For your example,

buf := make([]byte, 4)
len(buf) = 4; cap(buf) = 4

and, therefore,

n = 0, 1, 2, 3, or 4

len(buf) = 0, 1, 2, 3, or 4

To fix that bug, reset len(buf)for each Read,

p := buf[:cap(buf)]
n, err := rdr.Read(p)
buf = buf[:n]

or, more concisely,

n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]

The fix also handles the first Read (a boundary condition),

buf := make([]byte, 0, bufcap)
for {
    n, err := rdr.Read(buf[:cap(buf)])
    buf = buf[:n]
}

peterGo avatar Jan 11 '20 09:01 peterGo