apm-agent-rum-js icon indicating copy to clipboard operation
apm-agent-rum-js copied to clipboard

Handling huge cardinality of page load transaction names

Open hmdhk opened this issue 6 years ago • 4 comments

A little bit of background on the issue, we used included the page url as transaction name (without the query string) by default and we had the same problem even without the query string there are a lot applications that simply include ids in their url.

Currently the default transaction name is unknown and not the page url so all transactions are grouped by default under unknown but there is an API to let users set the initial page load transaction name and it can be set to the page url by the user which probably is the easiest way to set the page name. This can create a large number of transaction names.

Some solutions:

  • Having a url pattern config option that we use to set the transaction name to the page url
  • Implementing heuristics that tries to detects ids in the URL (I've made a POC) on the agent
  • Implementing a grouping algorithm on the Kibana side.

hmdhk avatar Sep 21 '18 08:09 hmdhk

I think we could do a two step approach here:

  1. Improve the API and add an additional method other than setInitialPageLoadName() that would directly take the URL and strip the query string. We could instead provide a helper method to do it or instruct the user how to do it, but I would prefer having an specific function for it.

  2. Allow a way to initialize an list of pattern matches that the user could define in order to strip the parameters embedded in the URL. I would try to use simple matching patterns and avoid using regex for simplicity. This matching could be done either in the agent or the server but it seems that a single matching per pageload in the agent is not a big deal and we potentially save resources on the server.

Also the default behaviour cloud be changed to, probably, the step 1, based on the url. In my opinion is better than the current unknown cc. @roncohen

alvarolobato avatar Sep 21 '18 10:09 alvarolobato

I would try to use simple matching patterns and avoid using regex for simplicity.

I like minimatch for these usecases https://github.com/isaacs/minimatch#minimatch

sorenlouv avatar Sep 21 '18 12:09 sorenlouv

Related to https://github.com/elastic/kibana/issues/26544

alvarolobato avatar Dec 04 '18 13:12 alvarolobato

We had a meeting around sampling and high cardinality:

  • We discussed storage and network traffic reduction
  • For storage we should look into aggregation and trimming data (e.g. removing spans for older transactions)
  • For network traffic
    • we still need to have sampling in some form or another
    • we discussed providing config options to let the user decide which transactions are important (this can be provided through central config)
    • Another idea is to crawl the website and discover the urls and let the user choose in the UI
  • High cardinality issue
    • We will provide a config option to let the user specify the url pattern (this can be configured in central config or in apm-server) -> issue
    • we discussed a heuristic based solution (POC)
    • we also discussed using machine learning to categorise url sections (I will do a POC on this)

cc @axw , @drewpost @vigneshshanmugam

hmdhk avatar Jun 10 '20 10:06 hmdhk