xk6-browser icon indicating copy to clipboard operation
xk6-browser copied to clipboard

URL Grouping/Aggregation

Open tom-miseur opened this issue 2 years ago • 4 comments

It is often useful to aggregate endpoint URLs that contain dynamic values. This is critical in the k6 Cloud due to the limits we have in place to prevent tests from emitting too-many-metrics/too-many-urls.

The URL Grouping documentation provides a solution for k6 scripts using the http module, but because xk6-browser operates at the browser-level, there is no opportunity for the user to apply the name tag to requests that require it.

The situation is compounded by the fact that xk6-browser gains visibility of all HTTP requests incurred by the browser, including 3rd party hosts that would not normally be interacted with at all using HTTP k6 scripts.

Potential solutions

Allowlist/blocklist hosts in xk6-browser

A cursory browse through Playwright docs suggests there is no convenient way of preventing/allowing requests to certain hosts, e.g. through specifying regular expressions. There is, however, a request interception mechanism involving Page.route or BrowserContext.route that could be used to abort requests that don't fit the criteria.

Pros:

  • would appear to support regex allow/block-listing which should be fairly easy for users to apply
  • doesn't actually send requests to undesired hosts, so no need to wait for #1321 and no need to worry about errors from 3rd party hosts (e.g. errors caused by rate limiting)

Cons:

  • users need to encounter the problem before then figuring out how to resolve it

Allowlist/blocklist hosts after-the-fact

This means xk6-browser still sends requests to the additional hosts, but that traffic can be filtered out of results.

Pros:

  • the user wouldn't need to run the test again to have filtering applied

Cons:

  • requests are sent to 3rd parties who may have rate limiting/bot protections in place that cause errors
  • k6 OSS would need some mechanism to ignore metrics from certain hosts (#1321)
  • k6 Cloud would need to be able to filter out hosts (unless #1321 would result in k6 Cloud not receiving the metrics at all which is quite likely)

Aggregation Rules

This would involve the user specifying URL grouping regular expressions (likely in options) ahead of time. Before any metric is generated, we check if the URL matches any of the patterns and apply the transformation as necessary.

Example:

export const options = {
  aggregations: [
    { regex: 'http:\/\/ecommerce\.test\.k6\.io\/checkout\/order-received\/.*\/\?key=.*', replace: '[id]' }
  ]
}

// http://ecommerce.test.k6.io/checkout/order-received/124/?key=bgravga43g43 -> http://ecommerce.test.k6.io/checkout/order-received/[id]/?key=[id]

Pros:

  • fairly straightforward to use; possibly even easier to implement than tagging requests with name
  • would be applicable to both http and xk6-browser
  • also solves the edge case where redirect requests contain dynamic IDs (you can apply a name tag to the request that initiates the redirect chain, but then all requests in that chain end up with the same name tag)

Cons:

  • requests are sent to 3rd parties who may have rate limiting/bot protections in place that cause errors
  • users need to encounter the problem before then figuring out how to resolve it
  • performance is likely going to be a concern here, given that all URLs would need to be evaluated against one or more regular expressions

tom-miseur avatar Jun 03 '22 23:06 tom-miseur

As mentioned over Slack, support for k6's blockHostnames option was added in #204, and released in v0.2.0. So you can give that a try right now and see if it helps.

That said, we'll still have to implement URL grouping by name, since that's currently not possible.

Using regex for this would be the more flexible option, but sticking with globbing patterns like with blockHostnames would be user friendlier. Considering this feature would also be useful for plain k6 scripts, where evaluating a regex for each URL might be too CPU intensive, using globbing would also perform better. Performance in this case isn't as important for xk6-browser, since we don't make requests with nearly the same frequency, so regex might work for us as well, but globbing seems like the way to go.

If we want to use the global options object, this will have to be implemented in k6 instead, since extensions don't have access to change it. It's worth discussing this with k6 devs, so @na--, WDYT? Would this feature also be useful for k6? If so, we should implement it there first, and then reuse the option in xk6-browser, in the same way we did for blockHostnames. If not, then this will have to be an xk6-browser-specific option, likely part of the BrowserContext options.

imiric avatar Jun 06 '22 08:06 imiric

Hmm, I don't have a very strong opinion here, but I'd prefer if we can avoid doing this via a new global option, at least until we have a clear idea of how to implement that optimally... :thinking:

Global options are always a heavy maintenance burden over time and they are often not flexible enough to address all use cases. In some cases they are unavoidable, but in general I think we've found that programmable APIs are both easier to maintain and more flexible.

In this case, maybe a new callback to the browser.newContext() parameters could be used? I am not familiar enough with xk6-browser to know if this is a good or even possible solution, just throwing it out there as a potential solution through the API instead of through the global config

na-- avatar Jun 06 '22 09:06 na--

@dgzlopes I'm looking forward to your suggestions on this one, thanks 🙇

inancgumus avatar Nov 09 '22 14:11 inancgumus

Sorry! I somehow missed responding to this one :disappointed:

I thought it could be interesting to have an automatic way of doing this. After all, we have the metrics data and all the URLs in k6! (at least for some time).

Maybe we could have the option to aggregate "high cardinality data" that would check the latest URLs and remove the highly changing part (and replace it with id_X or something).

There is a "similar" feature in Grafana that lets you dedup Loki logs based on the signature.

dgzlopes avatar Dec 01 '22 12:12 dgzlopes

Internally, if I remember correctly, we had something similar for Prometheus metrics labels, too (In Python).

dgzlopes avatar Dec 01 '22 12:12 dgzlopes