goavro
goavro copied to clipboard
Make TextualFromNative output deterministic
This is my take on resolving the #222. I used a simple approach of extracting the keys and sorting them in a deterministic way. Hopefully this is ok or can add some perspective to this issue
Hi and thanks! I'm worried about this having a perf impact. goavro has been very focused on perf. Go has a decent framework for running benchmarks. Would you please include a benchmark test with this PR as in this PR #223
Understood, will do as soon as I find available time
On my branch
goavro % go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16 100 20088 ns/op
BenchmarkNativeFromAvroUsingV2-16 100 3460976 ns/op
BenchmarkBinaryFromNativeUsingV2-16 100 1127675 ns/op
BenchmarkNativeFromBinaryUsingV2-16 100 3291675 ns/op
BenchmarkTextualFromNativeUsingV2-16 100 10312088 ns/op
BenchmarkNativeFromTextualUsingV2-16 100 7148802 ns/op
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16 100 391.8 ns/op
On master
go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16 100 19934 ns/op
BenchmarkNativeFromAvroUsingV2-16 100 3365947 ns/op
BenchmarkBinaryFromNativeUsingV2-16 100 1146937 ns/op
BenchmarkNativeFromBinaryUsingV2-16 100 3351293 ns/op
BenchmarkTextualFromNativeUsingV2-16 100 6815802 ns/op
BenchmarkNativeFromTextualUsingV2-16 100 7874666 ns/op
{"f1":{"k1":3.5}}
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16 {"f1":{"k1":3.5}}
100 403.8 ns/op
@xmcqueen What did you had in mind, do you want me to add some really large map with 1M records and compare or?
Another way to make it deterministic that would maybe be faster than sorting the keys would be to keep the order of the fields in the records in the schema. It would be consistent with what produce Avro library for Java and also with avro-tools.
Here I think the function should should use nameFromIndex
from the Record codec to loop through the names instead of using "range destMap" and that would solve the problem for records.
For map, of course there is no definitive order. The Java implementation uses LinkedHashMap
to preserve insertion order. Maybe this project could switch to something similar.