Add a new metric 'StolenTargetMemoryRatio' to sqlPerformanceMetrics in telegraf/plugins/inputs/sqlserver/sqlqueriesV1.go
Use Case
This metric will help identify potential memory leak introduced in sql 2019 CU14 where MEMORYCLERK_SOSNODE takes up exessive amount of available memory without releasing them, resulting in crashing the sql server. MEMORYCLERK_SOSNODE is categorized under 'Stolen Memory' in SQL Server.
UNION ALL SELECT 'Stolen target memory ratio', StolenTargetMemoryRatio = 100.0 * cntr_value / (SELECT 1.0*cntr_value FROM sys.dm_os_performance_counters WHERE counter_name = 'Target Server Memory (KB)') FROM sys.dm_os_performance_counters WHERE counter_name = 'Stolen Server Memory (KB)
Expected behavior
When the value of Stolen/Target memory goes above 50% or even 60% it can be considered alarming for DBA.
Actual behavior
it returns the ratio of stolen/target memory in sql server.
Additional info
https://learn.microsoft.com/en-us/answers/questions/826973/what-can-cause-memoryclerk-sosnode-clerk-gradually
@Trovalo does that make sense to you?
@chenteddybear a pull-request to add this is welcome... ;-)
it costs nothing to add this counter, it's just an additional string in the WHERE filter of the perfCounter query.
About how useful it might be... I usually use the "Page Life Expectancy" to check for memory pressure, but everyone has his ways
@chenteddybear,
Is using the Page Life Expectancy something you have considered?
Otherwise, are you going to put up a PR?
Thanks!
Hi @powersj Page Life Expectancy is not enough to indicate memory pressure / memory leak. The new metric may be useful in advising DBAs to investigate SQL memory clerks for odd behavior while troubleshooting.
As for a PR, I'm not sure exactly how to proceed, as it'll be my first time. Thank you.
@chenteddybear in 20-30mins there will be artifacts that you can download attached to the following PR: https://github.com/influxdata/telegraf/pull/14792
Could you please download and confirm that the new metric is visible? Thanks
That would be great to get it added as that a very useful stat to collect.
I'd mentioned it back here #9112 that it was missing from v2 stats onwards as well as Database Cache Memory and Free Memory, but never chased it up as just had to do it via an exec command.
It would be great to get these other two stats added in too.
@chenteddybear and @simonwhybrow can you please test the binary in PR #14792 and verify it produces what you expect!?
@chenteddybear and @simonwhybrow can you please test the binary in PR #14792 and verify it produces what you expect!?
Yep, looks good. I can see it pulling through Stolen memory. Any chance we can get the other two metrics added as mentioned in #9112 - Database Cache Memory (KB) and Free Memory (KB)? Or does that need to be a separate PR?
@powersj do you think we should add the two metric mentioned by @simonwhybrow?
I've updated the PR with the new additional metrics.
@simonwhybrow @chenteddybear,
Can you verify the updates look correct. Thanks
@powersj looks good. Thank you for adding those counters.
looks great! thank you for your help!