Metrics namespaces in 4.2 can break page cache, TX designations
metrics.namespaces.enabled=true
metrics.prefix=global
Investigation -- in Neo4j 4.2, most JMX metrics aren't enabled by default, so you already have to set metrics.jmx.enabled=true in order for Halin to work. As of Halin 0.15.0, the software knows this, detects it, and tells you to do this, breaking certain monitoring widgets until that is in place.
These other settings are more problematic. Take this as a probe query:
call dbms.queryJmx("*:*") yield name where name=~ '.*page_cache.*' return name order by name;
Normally, Neo4j would return metrics like this:
"neo4j.metrics:name=global.dbms.page_cache.hits"
"neo4j.metrics:name=global.dbms.page_cache.page_faults"
But with metrics.prefix=global, it actually renames all of the JMX metrics via configuration, like so:
"global.metrics:name=global.dbms.page_cache.hits"
"global.metrics:name=global.dbms.page_cache.page_faults"
So this means that in order for Halin to get JMX metrics out of the system, it must dynamically detect the configuration and change the way it requests JMX metrics. The feature was originally developed for reporting out to other systems, but in this case it's a non-backwards compatible change in Neo4j for Halin, and means that monitoring systems must be sensitive to Neo4j's config, unlike say a remote Prometheus system which is configured to simply absorb whatever it sent to it, with issues like the namespace being delegated to search filters.
In 4.2 neo4j docs, some of these issues aren't really discussed. The configuration properties are mentioned, but not really what namespacing does, or with any examples (ref here: https://neo4j.com/docs/operations-manual/current/monitoring/metrics/expose/). Additionally, Neo4j 4.2 includes a new "filter" capability where you can selectively enable/disable certain metrics. The docs don't say this, but this appears to apply only to CSV logging, not to what the JMX layer will report back. There is also no list of available metrics provided to drive what your filter can be, so you have to have pretty deep knowledge of the available metrics (i.e. from a previous version of the software) in order to use the filter accordingly.
Currently mulling whether to try to detect & fix this situation within Halin, or whether the components that require JMX data should simply break with a message saying that the issue is due to metrics configuration.