goavro icon indicating copy to clipboard operation
goavro copied to clipboard

Make TextualFromNative output deterministic

Open galion96 opened this issue 2 years ago • 4 comments

This is my take on resolving the #222. I used a simple approach of extracting the keys and sorting them in a deterministic way. Hopefully this is ok or can add some perspective to this issue

galion96 avatar Aug 02 '22 11:08 galion96

Hi and thanks! I'm worried about this having a perf impact. goavro has been very focused on perf. Go has a decent framework for running benchmarks. Would you please include a benchmark test with this PR as in this PR #223

xmcqueen avatar Aug 19 '22 15:08 xmcqueen

Understood, will do as soon as I find available time

galion96 avatar Aug 19 '22 17:08 galion96

On my branch

goavro % go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16                          100             20088 ns/op
BenchmarkNativeFromAvroUsingV2-16                    100           3460976 ns/op
BenchmarkBinaryFromNativeUsingV2-16                  100           1127675 ns/op
BenchmarkNativeFromBinaryUsingV2-16                  100           3291675 ns/op
BenchmarkTextualFromNativeUsingV2-16                 100          10312088 ns/op
BenchmarkNativeFromTextualUsingV2-16                 100           7148802 ns/op
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16                    100               391.8 ns/op

On master

go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16                          100             19934 ns/op
BenchmarkNativeFromAvroUsingV2-16                    100           3365947 ns/op
BenchmarkBinaryFromNativeUsingV2-16                  100           1146937 ns/op
BenchmarkNativeFromBinaryUsingV2-16                  100           3351293 ns/op
BenchmarkTextualFromNativeUsingV2-16                 100           6815802 ns/op
BenchmarkNativeFromTextualUsingV2-16                 100           7874666 ns/op
{"f1":{"k1":3.5}}
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16               {"f1":{"k1":3.5}}
     100               403.8 ns/op

@xmcqueen What did you had in mind, do you want me to add some really large map with 1M records and compare or?

galion96 avatar Aug 20 '22 14:08 galion96

Another way to make it deterministic that would maybe be faster than sorting the keys would be to keep the order of the fields in the records in the schema. It would be consistent with what produce Avro library for Java and also with avro-tools.

Here I think the function should should use nameFromIndex from the Record codec to loop through the names instead of using "range destMap" and that would solve the problem for records.

For map, of course there is no definitive order. The Java implementation uses LinkedHashMap to preserve insertion order. Maybe this project could switch to something similar.

ckarmann avatar Oct 31 '22 05:10 ckarmann