Go IO Cookbook: code review
Nice job. I performed a Go code review: correct, maintainable, reasonably efficient, and readable. Here are some suggestions for your directly using io.Reader code.
// Directly using io.Reader
package main
import (
"fmt"
"io"
"log"
"strings"
)
func main() {
rdr := strings.NewReader("this is the stuff I'm reading")
var data []byte
buf := make([]byte, 0, 4)
for {
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
data = append(data, buf...)
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
}
fmt.Println(string(data))
}
Playground: https://play.golang.org/p/k1O3Qf5K39k
Thanks :)
I have a couple of questions:
- Why is it preferable to use the capacity of the `buf variable rather than length when instantiating it? Go will zero-out the values anyway won't it?
- If an error occurs mid-read and there are some bytes written to the buffer, do we necessarily want to still append those to our
datavariable? My original code was a little bit more verbose so that it didn't append todatain the event of a non-EOF error
Q1. Why is it preferable to use the capacity of the
bufvariable rather than length when instantiating it? Go will zero-out the values anyway won't it?
We don't want to waste huge amounts of time testing and debugging, so we write provably correct and maintainable (which implies readable) programs that scale.
The buf byte slice variable contains the most recently read bytes. The number of bytes read is len(buf).
The statement
buf := make([]byte, 4)
is equivalent to
buf := make([]byte, 4, 4)
It has len(buf) = 4 and cap(buf) = 4, and the underlying byte array is initialized to zero (the byte zero-value). After initialization, the number of bytes read is zero, not 4, it's not correct. The correct buf initialization statement is
buf := make([]byte, 0, 4)
It has len(buf) = 0 and cap(buf) = 4, and the underlying byte array is initialized to zero (the byte zero-value).
Consider the code after maintenance by different programmers has added many more lines of code, including code between buf initialization and the buf Read statement.
If we have a similar requirement elsewhere, is this code robust enough to copy? Is it a good code pattern?
Q2. If an error occurs mid-read and there are some bytes written to the buffer, do we necessarily want to still append those to our
datavariable? My original code was a little bit more verbose so that it didn't append to data in the event of a non-EOF error.
Yes, always. The documentation for type io.Reader:
Package io
type Reader
Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.
In real programs this would be encapsulated in a function. Return all the data and the error. Allow the function caller to decide.
Suggestion: As a suggestion, consider encapsulating your examples in functions and methods for more realistic, less artificial Go code: io.Readers as function parameters; returning errors instead of panicing; returning nil instead of io.EOF; allowing the caller to handle errors; and so on.
For example,
// Directly using io.Reader
package main
import (
"fmt"
"io"
"log"
"strings"
)
func readAll(rdr io.Reader) ([]byte, error) {
var data []byte
buf := make([]byte, 0, 4)
for {
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
data = append(data, buf...)
if err != nil {
if err == io.EOF {
break
}
return data, err
}
}
return data, nil
}
func main() {
rdr := strings.NewReader("this is the stuff I'm reading")
data, err := readAll(rdr)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(data))
}
Thanks for your detailed explanation! I'll update the example to have the updated array instantiation and correct control flow, however for now I'm going to leave it in the one main() func as one of my goals was brevity when writing this code, and I want to keep the examples as compact as possible for explanatory purposes (though I agree that your example better reflects real-world usage)
I'll also have a read through those docs. Thanks :)
I've done some digging and it looks like the capacity approach doesn't actually help us. When we call buf[:cap(buf)] (checked via fmt.Println(spew.Sdump(buf[:cap(buf)]))) it looks like we're zeroing-out that array one way or another.
Your argument about considering when other developers add more lines inbetween doesn't consider that even if you set buffer to have zero length, inbetween we can still do things that increase the capacity, meaning buf[:cap(buf)] won't defend us against issues there. What are your thoughts?
Q. Your argument about considering when other developers add more lines inbetween doesn't consider that even if you set buffer to have zero length, inbetween we can still do things that increase the capacity, meaning buf[:cap(buf)] won't defend us against issues there. What are your thoughts?
By design, the code considers all of that and takes it into account.
For provably a correct function, we establish an invariant whose meaning doesn't change during execution.
// buf contains the most recently read len(buf) bytes
buf := make([]byte, 0, 4)
// ... buf is read-only
for {
// ... buf is read-only
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
// ... buf is read-only
}
// ... buf is read-only
where
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
is a condensed version of
p := buf[:cap(buf)]
n, err := rdr.Read(p)
buf = buf[:n]
In Go, arguments are passed by value, as if by assignment.
For each Read, p := buf[:cap(buf)] sets len(p) to the maximum buf capacity.
After n, err := rdr.Read(buf), where 0 <= n <= len(buf), buf = buf[:n] immediately reinstates the len(buf) invariant.
It is simple and easy to inspect all other references to buf, visually and/or by tool, to verify that they are read-only. For example,
data = append(data, buf...)
is read-only, and
buf = buf[:cap(buf)]
is not read-only.
QED
If we encounter code that modifies buf in a buf read-only section of code, it must be rejected during code review, if not before.
Q. I've done some digging and it looks like the capacity approach doesn't actually help us. When we call
buf[:cap(buf)](checked viafmt.Println(spew.Sdump(buf[:cap(buf)]))) it looks like we're zeroing-out that array one way or another.
I don't understand what you are saying. The capacity approach is essential. What do you mean by "When we call buf[:cap(buf)]?" buf[:cap(buf)] is a slice expression, not a function call. What do you mean by "we're zeroing-out that array one way or another" and why is that relevant? Where is the code to reproduce your fmt.Println(spew.Sdump(buf[:cap(buf)])) check and what does the output show?
The Go documentation:
Package
iohttps://golang.org/pkg/io/type Readerhttps://golang.org/pkg/io/#Reader
Readeris the interface that wraps the basicReadmethod.
Readreads up tolen(p)bytes intop. It returns the number of bytes read (0 <= n <= len(p)).type Reader interface { Read(p []byte) (n int, err error) }
There is no guarantee that any particular Read will read len(p) bytes.
// Directly using io.Reader
package main
import (
"fmt"
"io"
"log"
"strings"
)
func main() {
rdr := strings.NewReader("this is the stuff I'm reading")
var data []byte
buf := make([]byte, 0, 4)
for {
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
data = append(data, buf...)
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
}
fmt.Println(string(data))
}
After the first Read, Read(buf) reuses buf with the previous value of n for len(buf),
n, err := rdr.Read(buf)
buf = buf[:n]
where
0 <= n <= len(buf) <= cap(buf)
For your example,
buf := make([]byte, 4)
len(buf) = 4; cap(buf) = 4
and, therefore,
n = 0, 1, 2, 3, or 4
len(buf) = 0, 1, 2, 3, or 4
To fix that bug, reset len(buf)for each Read,
p := buf[:cap(buf)]
n, err := rdr.Read(p)
buf = buf[:n]
or, more concisely,
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
The fix also handles the first Read (a boundary condition),
buf := make([]byte, 0, bufcap)
for {
n, err := rdr.Read(buf[:cap(buf)])
buf = buf[:n]
}