client_golang
client_golang copied to clipboard
Client API: don't read entire response into a buffer before parsing it
The API client functions all follow this pattern, where they make an http call to get back a []byte then decode the JSON:
https://github.com/prometheus/client_golang/blob/f63e219e6b9074f8a55c8475e7b11720bdfc3737/api/prometheus/v1/api.go#L830-L836
For larger responses, this buffer gets quite expensive (see #976).
I propose that instead we parse JSON from the response body as it comes in. I can see that this would make handling timeouts more complicated.
Thanks for bringing it up. I'm happy to accept any changes for the API part. It is still considered experimental and we have wiggle room to experiment. @bboreham Do you need a hand on this or do you plan to take care of it?
Right now I don't expect to start on this. It's quite a big job and not critical to what I'm doing.
There was some discussion in #763 of code-generating the API, which seems an attractive way to make such structural changes.
Hey @kakkoyun, can I take this up?
This is basically changing the methods to return parsed data instead of []byte, or something else?
The current code calls h.client.DoGetFallback() which returns a []byte, via bytes.Buffer.ReadFrom().
(This call is made in multiple places, for each different API method).
My idea was to parse JSON from resp.Body, and not create the buffer.
The []byte is being returned by a Do method in client.go. Did you mean changing this method to just return the response and not return []byte (and consequently wherever changes are required), or did I go too deep?
Full flow through which I reached here: DoGetFallback -> apiClientImpl.Do -> httpClient.Do
Had some discussion on this thread on Slack that the aim is likely to parse resp.Body directly into JSON, would like to clarify some more points:
- Should the unmarshalling be done as early as possible when the actual HTTP request is made, or as late as possible?
- Should there be new methods that use this signature, or should existing ones be modified?
cc @kakkoyun @bboreham
@importhuman
- Should the unmarshalling be done as early as possible when the actual HTTP request is made, or as late as possible?
- Should there be new methods that use this signature, or should existing ones be modified?
I think we need to check the codebase for this.
If the result of this method is always consumed as JSON, we should unmarshal once and as early as possible.
In any case, we should do this by introducing a new method and marking older methods as deprecated. We can remove them with v2. In this way, we can make sure we don't break compatibility for external consumers.
The main API is methods like:
func (h *httpAPI) Query(ctx context.Context, query string, ts time.Time) (model.Value, Warnings, error) {
this returns structured data, not JSON.
My proposal is to change how we arrive at the structured data.
No public API needs to change to achieve this.
I've just noticed that the code calls the JSON parser twice to read each request. Once in `apiClientImpl.Do: https://github.com/prometheus/client_golang/blob/618194de6ad3db637313666104533639011b470d/api/prometheus/v1/api.go#L1140
and once in the individual API methods like Query:
https://github.com/prometheus/client_golang/blob/618194de6ad3db637313666104533639011b470d/api/prometheus/v1/api.go#L859
Then, different data types make nested calls: https://github.com/prometheus/client_golang/blob/618194de6ad3db637313666104533639011b470d/api/prometheus/v1/api.go#L584
I suspect the first two could be merged without major change, but the last kind needs the code rewritten in "streaming" style to avoid building an intermediate buffer.