zap icon indicating copy to clipboard operation
zap copied to clipboard

.With() function repeating keys/values

Open henriquechehad opened this issue 7 years ago • 4 comments

Using .With() function it's repeating the same key/value for every With call. Should it overwrite the previous values or have a function to remove specific keys/values previously set?


import "go.uber.org/zap"

func main() {
	logger, _ := zap.NewProduction()
	defer logger.Sync()
	sugar := logger.Sugar()

	sugar = sugar.With("test", "value")
	sugar.Info("message 01")

	// prints:
	// {"level":"info","ts":1534362222.643982,"caller":"tmp/main.go:11","msg":"message 01","test":"value"}

	sugar = sugar.With("test", "value")
	sugar.Info("message 02")

	// prints duplicated "test" "value"
	// {"level":"info","ts":1534362222.644039,"caller":"tmp/main.go:14","msg":"message 02","test":"value","test":"value"}
}

henriquechehad avatar Aug 15 '18 19:08 henriquechehad

This is as designed, and is documented in NewJSONEncoder. It's technically allowable by the JSON specification, and all deserialization code that I'm aware of (including Go's standard library) preserves only the last value.

Supporting a last-writer-wins policy as you suggest is remarkably difficult in zap (and similar projects) - zap is fast because it encodes fields as they're added, without maintaining some intermediate representation (commonly map[string]interface{}). It's technically possible, and could even be made reasonably fast in the case when there are no duplicates, but we haven't run into many situations where it's valuable - usually we run into this when two developers are mistakenly stomping on each others' log data, so keeping both values in the output makes debugging easier.

In short, this is functioning as designed. If you're interested in a fairly complex PR, I can guide you through how we might implement this feature and benchmark the performance impact.

akshayjshah avatar Aug 28 '18 21:08 akshayjshah

Just a note:

This seems to be an issue when sending said logs to GCP's stackdriver. It takes the duplicated fields and concats them.

{
	"trace": "foo",
	"trace": "foo"
}

will be presented in stackdriver as

{
	"trace": "foofoo"
}

geoah avatar Mar 03 '21 10:03 geoah

Huh! I'm a little surprised by this, but perhaps Google's also trying to avoid dropping the duplicate data. I'm no longer at Uber, so the current maintainers have the final say on whether to make any changes to the current duplicate-handling code.

If this is particularly inconvenient for you, you can wrap zap's JSON encoder and fix it yourself. To keep performance reasonably good, you could use a Bloom filter:

  • The JSON encoder can keep track of all keys seen so far in an uint64 Bloom filter.
  • If it sees a key that's possibly a duplicate, it can unmarshal the JSON accumulated so far into a map[string]json.RawMessage, overwrite the existing data (if any), and re-serialize it.

That would keep serialization reasonably fast and zero-allocation when there are no duplicates. I haven't thought through how you'd handle duplicates in nested objects (created by zap.Namespace).

akshayjshah avatar Mar 03 '21 19:03 akshayjshah