simplefeatures wkb: add benchmarking script

as mentioned in https://github.com/peterstace/simplefeatures/pull/421, this PR adds the benchmarking script used to generate the diffs shown on that PR description.

I'm opening this as a DRAFT since I'm not happy with the current state of this and would like some feedback about how to make it better before opening it up for merging:

the test fixtures are a copy->paste of the ones from wkb_test.go, how best to share these?
the decode function is a copy of hexStringToBytes with the t *testing.T removed

I tried a few things but none felt quite right:

create a wkb_util_test.go file which has the hex decoder and the PostGIS/Spatialite fixtures in it
create a testdata/xxx file which could be shared.

What do you think? what's the best way to share these between the wkb_test.go and wkb_bench_test.go files?

Usage:

go test -bench=. -count=5 -test.run=none -benchmem geom/wkb_bench_test.go > master.bench

# switch branch
go test -bench=. -count=5 -test.run=none -benchmem geom/wkb_bench_test.go > pr.bench

# note the count=5 above, the benchstat command requires each file 
# contains the concatenated output of several runs.
benchstat master.bench pr.bench

Oct 27 '21 09:10 missinglink

would it make sense to store the wkb fixtures in []byte instead of string? wrangling hex strings in order to test binary operations is a bit of extra work and IMO they don't offer much more in terms of readability 🤷‍♂️

Oct 27 '21 09:10 missinglink

What do you think? what's the best way to share these between the wkb_test.go and wkb_bench_test.go files?

I think there are two separate things that would be ideal to share between wkb_test.go and wkb_bench_test.go:

The test case definitions
The hexStringToBytes/decode function definition.

First, the test case definitions, i.e. we don't want to have duplicates of definitions such as:

{
    wkb: "0101000000000000000000f87f000000000000f87f",
    wkt: "POINT EMPTY",
},

The way I'd probably approach this is to define a type in wkb_test.go to hold a test case:

type wkbWKTPair {
    wkb string
    wkt string
}

And then define the test cases as a global variable, e.g.:

var validWKBCases = []wkbWKTPair{
    {
        wkb: "0101000000000000000000f87f000000000000f87f",
        wkt: "POINT EMPTY",
    },
    {
        // Trailing byte
        wkb: "0101000000000000000000f87f000000000000f87f00",
        wkt: "POINT EMPTY",
    },
    ...
}

Then TestWKBParseValid becomes:

func TestWKBParseValid(t *testing.T) {
    for i, tt := range validWKBCases {
        t.Run(strconv.Itoa(i), func(t *testing.T) {
            ...
        }
    }
}

And then BenchmarkWKBParse could iterate over validWKBCases in a similar way. I think this might be similar to what you were referring to by "create a wkb_util_test.go file which has the hex decoder and the PostGIS/Spatialite fixtures in it". I don't think it necessarily has to go in a separate wkb_util_test.go file, but that would be ok too.

Secondly, the hexStringToBytes/decode definitions. Is the reason that you had to copy these because hexStringToBytes accepts a t *testing.T? (which you don't have access to in a benchmark?). If so, I think that could be solved by changing the signature from hexStringToBytes(t *testing.T, s string) to hexStringToBytes(tb testing.TB, s string). That way you can either pass in a *testing.T or a *testing.B, depending on whether it's called from a unit test or a benchmark.

Oct 27 '21 19:10 peterstace

Hi @peterstace sorry for the long delay, this should be good to go now.

I wasn't able to figure out a nice way of getting the call to hexStringToBytes() outside the benchmark codepath, although I'm not super concerned as it will be equal between runs.

One solution would be to add a []byte field to the test struct, along with the two strings, let me know if you'd like to go down that route.

Nov 03 '22 13:11 missinglink

In regards to code organization, are you happy with this geom/wkb_bench_test.go file or would you prefer to move the BenchmarkWKBParse into geom/wkb_test.go?

I know some people prefer to have their benchmarks separated from the unit tests, while others prefer to keep them together, I'm happy to follow whatever convention you prefer.

Nov 03 '22 13:11 missinglink

although I'm not super concerned as it will be equal between runs.

Agreed, I'm not concerned either. Any differences in the benchmarked functionality would still show through as changes since the overhead would be unchanged.

In regards to code organization, are you happy with this

I'm happy with how it is -- most of the benchmarks are in separate files to regular unit tests, so that fits in well with what you've done.

Nov 06 '22 09:11 peterstace