diagnostics
diagnostics copied to clipboard
What is the future for dotnet-counters list?
Background and Motivation
When we first added the dotnet-counters list command it was a quick and dirty stop-gap to help people understand a small number of counters that existed in the runtime. There were no docs or dynamic discovery mechanisms to utilize. Over time this mechanism has grown in complexity trying to track which counters are available in different .NET versions and the number of counters keeps increasing. The data encoded in the tool is effectively a 2nd parallel documentation on top of the official docs. This creates the standard risk that the two copies will drift apart and doubles the amount of work to make updates. The list mechanism also leaves out all counters in assemblies that aren't part of the .NET core product so 3rd party counters aren't discoverable.
Proposed Feature
We should decide what direction we want dotnet-counters list to go. A few potential ideas:
- We replace the content with a link to the official docs
- We deprecate the list command and replace it with some feature that does dynamic discovery
- We preserve the list command and make a plan for how it will stay up-to-date
Something else?
The goal of this issue is only to decide what we want to do and describe our plan in some doc or issue. Executing on that plan is a follow up task to come later.
I would prefer that there is a mechanism for dynamic discovery, as that will help users in a couple of ways:
- They will be able to get a more accurate list for the runtime, rather than relying on what is hard coded in to it
- It enables discovery for application developers for what they have referenced from their own developers and 3rd party libraries.
- It helps them diagnose whether their own counters are working correctly.
I look at dotnet-counters as mostly an interactive diagnostic tool rather than something that is going to be used as part of an end-to-end monitoring solution, so having to enter in the names is more of an inconvenience, than being useful. Maybe having a wildcard feature for the provider/countername would be the most useful. For example:
dotnet-counters monitor -n loadtest --counters *
dotnet-counters monitor -n loadtest --counters "Microsoft.*,System.Net.*"
So you could just use * for a complete wildcard. It may also be necessary to change the output for the interactive version to include the eventname rather than just the description, so that users know what to feed back in to be more specific about the output.
Thanks @samsp-msft! I agree that dynamic discovery feels like the best overall user experience (also the most expensive to engineer but ignoring that for now :)
Going into a little more of the UX details I worry that the monitor command only has 30-80 rows to work with on a typical console window. Many wildcard expressions will probably overflow the available space and the user will get incomplete results. Drawing parallels from command-line text editors one thought I had was to implement virtualized scrolling of the content. That handles the limited vertical UI space but then I realized we hit the next set of limits - by default MetricsEventSource only tracks at most 100 histograms and 1000 time series. A single instrument could easily have large numbers of label combinations that rapidly exhausts those limits regardless whether the user cared about the data from that particular counter. We could keep going modeling a UI movable selection cursor and do interactive collapse/expand gestures on provider and counter nodes in the tree view to guide where we want to spend our limited budget of time-series data but it felt like the engineering bill would be getting excessive at this point.
An alternative dynamic discovery approach that I hope still captures 80% of the ease of use with a lot less UI complexity is to modify the list command to take the same PID/process name arguments that collect and monitor do. In this view we could write to the console one counter per line as they are discovered. This would use standard console scrolling to handle long output and because we are only capturing the counter metadata rather than values we don't have to worry about the time-series limit. Output might look like:
> dotnet-counters list -n loadtest --counters "System.*"
Provider Counter Description
-------- ------- -----------
System.Runtime cpu-usage The percent of process' CPU usage relative to all
of the system CPU resources [0-100]
System.Runtime working-set Amount of working set used by the process (MB)
System.Runtime gc-heap-size Total heap size reported by the GC (MB)
[more counters elided]
System.Net.Sockets bytes-sent Total number bytes transmitted via any System.Net.Socket
System.Net.Sockets bytes-received Total number bytes received via any System.Net.Socket
[more counters elided]
Another advantage of this approach is that we see the description text that we wouldn't have available with dotnet-counters monitor
output. One annoyance of this approach is since we are writing the counters out as soon as they are discovered the list order is determined by the order they are created in the process. If we want alphabetical order I think that means either:
- we have some arbitrary cutoff point for discovery and counters created after that point won't be shown.
- we are back to a dynamically sized, dynamically updating scrollable virtual text buffer so that we can insert new counters in sorted order as they are created.
I'm not convinced that complexity is worth it but calling it out if you or others feel differently.
One last thing for now is call out that the MetricEventSource that exposes data recorded with the .NET 6 Meter API already has the underlying support for enumerating counters by listening to the InstrumentPublished event. However data recorded using the EventCounters API does not have any current out-of-process enumeration mechanism. Our options are either:
- Automatically map EventCounters to Meter/Instrument in-proc so we can use the MetricEventSource enumeration capability.
- Add new EventSource enumeration capabilities to EventPipe, and EventCounter enumeration capability to EventSource.
At the moment I think the EventCounters -> Meter/Instrument is the cheaper of the two to implement and has some nice additional benefits beyond just this scenario. However regardless what approach we take it means dynamic enumeration of counters won't work well targetting .NET 6 or 7 so we probably have to detect what version of the runtime we are attaching to and fall back to current behavior for them.
Thoughts?
Managing metrics when they have any number of dimensions or the dimensions have more than a couple of values is going to be challenging with a console tool. So this brings up the real question - what is the purpose and remit of the dotnet-counters tool? Is it to monitor services in production? Help with testing your app?
We probably need a syntax for being able to specify what dimensions you want to view for each counter. The default should be to show all dimensions, but have the syntax to be able to filter to specific dimensions and values. Something like:
dotnet-counters collect -providers provider_name[counter_one, counter_two(mydimension), counter_three(mydimension=value), counter_four()]
Showing all dimensions by default aids in discovery, and so you don't have to filter them out, unless it becomes too noisy. In the above:
- all dimensions for counter_one would be shown
- only
mydimension
would be shown for counter_two - only the specified value of mydimension for counter_three
- no dimensions for counter_four
Having List
as a separate command to enumerate is a good compromise for being able to collect the data. I created an in proc demo along with events, which shows potential in the concept.
The List
function can either be dynamic and emit the results as they occur, and/or have an output file option to write the results to a file when collection has ended. I can see value in both, and can share the same infrastructure - its just about when to report the results.
what is the purpose and remit of the dotnet-counters tool? Is it to monitor services in production? Help with testing your app?
I think of it serving a few roles:
- Quickly verify that an app is producing expected metrics, such as when doing a code change to add metrics or when troubleshooting missing metrics data.
- Ad-hoc viewing of metrics when that is useful to troubleshoot some problem in the app or when initially learning about metrics.
- It can be used as a simple monitor for production services, likely by scripting it and using the 'collect' command instead of 'monitor' to write data to a file. I expect most users would be better served by a more comprehensive solution such as using OpenTelemetry, dotnet-monitor and/or Prometheus.