apm-agent-rum-js
apm-agent-rum-js copied to clipboard
Handling huge cardinality of page load transaction names
A little bit of background on the issue, we used included the page url as transaction name (without the query string) by default and we had the same problem even without the query string there are a lot applications that simply include ids in their url.
Currently the default transaction name is unknown
and not the page url so all transactions are grouped by default under unknown
but there is an API to let users set the initial page load transaction name and it can be set to the page url by the user which probably is the easiest way to set the page name. This can create a large number of transaction names.
Some solutions:
- Having a url pattern config option that we use to set the transaction name to the page url
- Implementing heuristics that tries to detects ids in the URL (I've made a POC) on the agent
- Implementing a grouping algorithm on the Kibana side.
I think we could do a two step approach here:
-
Improve the API and add an additional method other than
setInitialPageLoadName()
that would directly take the URL and strip the query string. We could instead provide a helper method to do it or instruct the user how to do it, but I would prefer having an specific function for it. -
Allow a way to initialize an list of pattern matches that the user could define in order to strip the parameters embedded in the URL. I would try to use simple matching patterns and avoid using regex for simplicity. This matching could be done either in the agent or the server but it seems that a single matching per pageload in the agent is not a big deal and we potentially save resources on the server.
Also the default behaviour cloud be changed to, probably, the step 1, based on the url. In my opinion is better than the current unknown
cc. @roncohen
I would try to use simple matching patterns and avoid using regex for simplicity.
I like minimatch for these usecases https://github.com/isaacs/minimatch#minimatch
Related to https://github.com/elastic/kibana/issues/26544
We had a meeting around sampling and high cardinality:
- We discussed storage and network traffic reduction
- For storage we should look into aggregation and trimming data (e.g. removing spans for older transactions)
- For network traffic
- we still need to have sampling in some form or another
- we discussed providing config options to let the user decide which transactions are important (this can be provided through central config)
- Another idea is to crawl the website and discover the urls and let the user choose in the UI
- High cardinality issue
cc @axw , @drewpost @vigneshshanmugam