client_golang icon indicating copy to clipboard operation
client_golang copied to clipboard

Client API: don't read entire response into a buffer before parsing it

Open bboreham opened this issue 3 years ago • 9 comments

The API client functions all follow this pattern, where they make an http call to get back a []byte then decode the JSON: https://github.com/prometheus/client_golang/blob/f63e219e6b9074f8a55c8475e7b11720bdfc3737/api/prometheus/v1/api.go#L830-L836

For larger responses, this buffer gets quite expensive (see #976).

I propose that instead we parse JSON from the response body as it comes in. I can see that this would make handling timeouts more complicated.

bboreham avatar Jan 24 '22 14:01 bboreham

Thanks for bringing it up. I'm happy to accept any changes for the API part. It is still considered experimental and we have wiggle room to experiment. @bboreham Do you need a hand on this or do you plan to take care of it?

kakkoyun avatar Jan 25 '22 10:01 kakkoyun

Right now I don't expect to start on this. It's quite a big job and not critical to what I'm doing.

There was some discussion in #763 of code-generating the API, which seems an attractive way to make such structural changes.

bboreham avatar Jan 25 '22 10:01 bboreham

Hey @kakkoyun, can I take this up?

This is basically changing the methods to return parsed data instead of []byte, or something else?

importhuman avatar Jun 10 '22 13:06 importhuman

The current code calls h.client.DoGetFallback() which returns a []byte, via bytes.Buffer.ReadFrom(). (This call is made in multiple places, for each different API method).

My idea was to parse JSON from resp.Body, and not create the buffer.

bboreham avatar Jun 10 '22 17:06 bboreham

The []byte is being returned by a Do method in client.go. Did you mean changing this method to just return the response and not return []byte (and consequently wherever changes are required), or did I go too deep?

Full flow through which I reached here: DoGetFallback -> apiClientImpl.Do -> httpClient.Do

importhuman avatar Jun 14 '22 16:06 importhuman

Had some discussion on this thread on Slack that the aim is likely to parse resp.Body directly into JSON, would like to clarify some more points:

  • Should the unmarshalling be done as early as possible when the actual HTTP request is made, or as late as possible?
  • Should there be new methods that use this signature, or should existing ones be modified?

cc @kakkoyun @bboreham

importhuman avatar Jun 20 '22 13:06 importhuman

@importhuman

  • Should the unmarshalling be done as early as possible when the actual HTTP request is made, or as late as possible?
  • Should there be new methods that use this signature, or should existing ones be modified?

I think we need to check the codebase for this.

If the result of this method is always consumed as JSON, we should unmarshal once and as early as possible.

In any case, we should do this by introducing a new method and marking older methods as deprecated. We can remove them with v2. In this way, we can make sure we don't break compatibility for external consumers.

kakkoyun avatar Jun 20 '22 13:06 kakkoyun

The main API is methods like:

func (h *httpAPI) Query(ctx context.Context, query string, ts time.Time) (model.Value, Warnings, error) {

this returns structured data, not JSON.

My proposal is to change how we arrive at the structured data.

No public API needs to change to achieve this.

bboreham avatar Jun 20 '22 16:06 bboreham

I've just noticed that the code calls the JSON parser twice to read each request. Once in `apiClientImpl.Do: https://github.com/prometheus/client_golang/blob/618194de6ad3db637313666104533639011b470d/api/prometheus/v1/api.go#L1140

and once in the individual API methods like Query: https://github.com/prometheus/client_golang/blob/618194de6ad3db637313666104533639011b470d/api/prometheus/v1/api.go#L859

Then, different data types make nested calls: https://github.com/prometheus/client_golang/blob/618194de6ad3db637313666104533639011b470d/api/prometheus/v1/api.go#L584

I suspect the first two could be merged without major change, but the last kind needs the code rewritten in "streaming" style to avoid building an intermediate buffer.

bboreham avatar Sep 18 '22 14:09 bboreham