Cache JMX connection and MBeans names and attributes
Hey, I saw significant improvement with jmx connection cached. Here is a chart for sample cassandra cluster without any traffic, orange line is for an instance with patched exporter:
Cassandra Clusters with traffic have GC time and throughput decreased, and we saw latencies improved up to 50%. But this depends on the traffic patters and vary from cluster to cluster.
Implementation details:
- scraper is cached in collector's config instance, so it's replaced every time config is reloaded.
- jmx connection is cached in scraper, and it uses Cleaner API to close it when scraper is discarded.
- MBeans names and attributes are cached in scraper. JMX client subscribes to MBean register/unregister events to invalidate this cache.
- JMX is reconnected when any exception occurs. I tested only in javaagent mode, so didn't have a chance to test reconnection.
@max-melentyev Thanks for the PR! I'll need to review it.
@max-melentyev This PR changes a lot and makes some assumptions that I don't feel are always valid.
An MBeanServerConnection can register for MBean registration/unregistration notification events, but the PR doesn't appear to handle the scenario where a DynamicMBean with dynamic attributes is in use.
Thoughts?
Hey, I'm not very familiar with java. I've tested this PR with cassandra and it produced the same result as a version from trunk. Cassandra adds some new metrics during bootstrap and when new tables created so without this event handler some metrics were missing. If there's another way MBean can start generating different set of metrics without giving a notification, then it's probably not handled by the code.
There are other issues with this code though:
- Looks like local jmx connection uses singleton object, so when config is reloaded every handler is registered on the same object and they got stacked. So old handlers should be unregistered explicitly. And this most likely prevents scraper from being GC'ed.
- Cleaner api doesn't work with java 8, agent fails to load.
I started working on a fix, but we decided to use https://github.com/k8ssandra/management-api-for-apache-cassandra which showed the same performance as jmx exporter with this patch. Please let me know if you want me to push my WIP fix, or feel free to close the PR.
@max-melentyev I would say push your latest code. It may be something we should look at in the future.
Done