elastic
elastic copied to clipboard
bulk: Add easyjson bindings
Makes it possible to override the decoder and use easyjson effectively with the bulk service
type EasyJsonDecoder struct {
}
func (e *EasyJsonDecoder) Decode(b []byte, t interface{}) error {
if v, ok := t.(easyjson.Unmarshaler); ok {
return easyjson.Unmarshal(b, v)
}
return json.Unmarshal(b, t)
}
elastic.NewClient(elastic.SetDecoder(&EasyJsonDecoder{}))
Thanks for your commit.
I probably won't add anything like this to the core library. Last time I checked the cost of deserialization was neglectable compared to the I/O and network latency of a bulk request.
But I'd be happy to find a way for users to opt-in to easyjson if there is a way to do so orthogonally. I haven't thought about it though. What could be done easily is to provide your own easyjson-based Bulk service, and simply using Client.PerformRequest to do the low-level stuff. But maybe there's an even simpler way.
If I were to add this to the core library, it should be go-generated like e.g. the BulkIndexRequest (see here and users should be able to opt-in with something like UseEasyJSON().
Standard library encoding / json deserialization cost is expensive
@olivere Not sure if I misunderstood or you did, the easyjson bit is strictly opt-in - it requires you to override the decoder to support easyjson (like in my comment above)
Other than adding the go:generate line, is there something else that could be done to sway your opinion of not merging this?
@wuurrd I'm sorry if it sounded too harsh. It's always hard for me to simply close PRs, as a discussion would be beneficial most of the time.
EasyJSON is such a topic. There are different perspectives to this. E.g. do I want to allow Elastic to opt-in to use something like easyjson? Yes, definitely. Do I want to add easyjson to elastic natively? No, in general I don't. Using it for bulk indexing is an exception to the rule, and I'm still not sure if it was worthwhile to add it to elastic in the first place.
So if you want to use easyjson decoding with elastic, you could put the EasyJSON decoder into your code, and it doesn't have to be in elastic at all. If I put it into elastic, that'd communicate the fact that I would support EasyJSON in the library, which I don't (with bulk indexing being an exception).
Concerning the deserialization of BulkResponse with elastic, I might still be open to do that. However, it needs to be done in a way that is compatible with/similar to what we already have with the other structures that support EasyJSON, i.e. add it in a similar fashion as BulkIndexRequest and with go:generate. Just dropping the deserialized code into the library would not be the right approach, we need to generate it. After all, the EasyJSON package might change over time and so the generted serialization/deserialization code will change as well. I'm afraid I haven't looked into how this should be done, but I'd be open to discuss how to do this for BulkResponse.
Standard library encoding / json deserialization cost is expensive
@zplzpl Compared to what? Protobuf encoding? Yes. Network latency? Not at all.
Don't get me wrong, but microbenchmarks won't help here. Just shaving off a few microseconds from serialization/deserialization won't help too much as you neglect I/O latency, which--last time I looked--was orders of a magnitude slower in terms of performance that encoding/decoding. I need to look at that again though, but I'd be surprised if that changed recently.
@olivere Not harsh at all, very impressed with the work you've done with this library, and totally understand that you don't want to force easyjson on users without them wanting to use it in the first place.
My first idea was to use easyjson externally, but unfortunately there are a few problems with that
- Running easyjson without generating the _easyjson.go file does not give any performance benefits
- Generating easyjson bindings outside of the elastic package won't allow
UnmarshalEasyJSONfuncs to work (requires manual tweaking of the outputted file) - If we generate the easyjson bindings outside of the elastic package we struggle with elastic.shardsInfo not being exposed (requires substantial tweaking of the generated file to ignore the _shards field)
Do you have workarounds for the above issues (that don't involve using a forked elastic repo)?
Re: the above performance stuff- when profiling I see approximately 20% of total CPU time spent in unmarshalling json BulkResponse objects in an application that reads from Kafka, protobuf decodes, and bulk inserts into ES - local patch with this branch takes that number down to < 5% CPU time (and increases ingest rate by about 15%
@wuurrd Okay, I will look into this. Opening up shardsInfo is trivial to do, so that wouldn't be a problem. I hope it's not time critical for you, as it's not top priority for me now tbh (looking into 7.0 alpha right now).
I think I will have to review the performance stuff. Am I right that Bulk API is of highest interest for you?
@olivere Yes, the Bulk API performance is the most performance critical bit of this library we use.
- as an aside, (and so sorry for adding a feature request as part of this), but another thing that would be fantastic for us would be to hook a channel / a callback when the health changes - so that we can stop consuming messages from kafka when that happens. ^ I can make an attempt at the above and create a separate PR.
You are not at GOTO conference in Copenhagen this week just by coincidence? Otherwise we could discuss these things in person.
github.com/json-iterator/go
This project is also very interesting, I hope to draw your attention.
| ns/op | allocation bytes | allocation times |
|---|---|---|
| std decode | 35510 ns/op | 1960 B/op |
| easyjson decode | 8499 ns/op | 160 B/op |
| jsoniter decode | 5623 ns/op | 160 B/op |
| std encode | 2213 ns/op | 712 B/op |
| easyjson encode | 883 ns/op | 576 B/op |
| jsoniter encode | 837 ns/op | 384 B/op |
It’s very attractive.
I think this project is a good example. https://github.com/gin-gonic/gin/tree/master/internal/json
github.com/json-iterator/go
This project is also very interesting, I hope to draw your attention.
I think this project is a good example. https://github.com/gin-gonic/gin/tree/master/internal/json
I tend to focus on the standard library whenever possible. In the long run, there's nothing coming even close to its stability, it's getting faster over time, your library doesn't turn into a turmoil of dependencies and so much more. Notice that I don't mean to turn down the authors of the libraries you mentioned: Not at all. I will definitely have a look at it. But in general, I've learned to be very hesistant when it comes to adding a new dependency.
So there seems to be some support for easyjson now? https://github.com/olivere/elastic/blob/release-branch.v7/bulk_create_request_easyjson.go