harvest icon indicating copy to clipboard operation
harvest copied to clipboard

Qtree metrics are intermittently showing 0 value

Open jmg011 opened this issue 2 years ago • 21 comments

qtree_disk_limit is showing intermittent 0 and correct value. qtree_disk_used is showing intermittent 0 and correct value.

An example of a qtree_disk_limit

image

An example of a qtree_disk_used

image

jmg011 avatar May 17 '22 15:05 jmg011

hi @jmg011 few questions:

  • what version of Harvest?
  • what version of ONTAP cluster?
  • is this poller using the out-of-the-box qtree.yaml template or has it been customized? If so, can you share your template?
  • maybe the ZAPI is timing out when collecting qtree or quota. Any errors in the log files for the poller in question? Maybe check with something like this
grep -R -E -i "qtree" /var/log/harvest/

cgrinds avatar May 17 '22 16:05 cgrinds

@cgrinds

Harvest version: harvest version 22.03.3118-nightly

cluster version: NetApp Release 9.7P7

qtree template:

name: Qtree query: qtree-list-iter object: qtree

counters: qtree-info: - ^export-policy - id - ^oplocks - ^security-style - ^status - ^^volume - ^^vserver => svm - ^^qtree => qtree

collect_only_labels: true

plugins: Qtree: objects: - disk-limit - disk-used - disk-used-pct-disk-limit - disk-used-pct-soft-disk-limit - disk-used-pct-threshold - file-limit - files-used - files-used-pct-file-limit - files-used-pct-soft-file-limit - soft-disk-limit - soft-file-limit - threshold #batch_size: "50"

export_options: instance_keys: - qtree - volume - svm

instance_labels: - export_policy - oplocks - security_style - status

I do see lot of "message":"lagging behind schedule" messages.

Could this be the cause of this problem?

jmg011 avatar May 21 '22 15:05 jmg011

@jmg011 Yes most likely it is due to high zapi time. Could you share harvest logs? There have been few improvements (https://github.com/NetApp/harvest/pull/887) in this area but your build should be having these.

rahulguptajss avatar May 23 '22 08:05 rahulguptajss

@rahulguptajss - pasting the latest log for qtree.

{"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T02:55:24-07:00","message":"lagging behind schedule 2m6.745095154s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:07:24-07:00","message":"lagging behind schedule 1m59.945925487s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:19:14-07:00","message":"lagging behind schedule 1m49.569946339s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:31:13-07:00","message":"lagging behind schedule 1m59.618461717s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:43:21-07:00","message":"lagging behind schedule 2m8.042717764s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:46:28-07:00","message":"lagging behind schedule 354.349763ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:55:14-07:00","message":"lagging behind schedule 1m53.0319939s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T03:56:26-07:00","message":"lagging behind schedule 354.251614ms"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T04:06:25-07:00","message":"lagging behind schedule 51.059909ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T04:07:19-07:00","message":"lagging behind schedule 2m4.96486915s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T04:19:20-07:00","message":"lagging behind schedule 2m0.588789669s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T04:31:21-07:00","message":"lagging behind schedule 2m0.778544139s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T04:43:13-07:00","message":"lagging behind schedule 1m52.423609926s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T04:55:13-07:00","message":"lagging behind schedule 2m0.172452415s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:07:17-07:00","message":"lagging behind schedule 2m4.071615049s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:19:18-07:00","message":"lagging behind schedule 2m0.685482392s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:26:25-07:00","message":"lagging behind schedule 287.670111ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:31:16-07:00","message":"lagging behind schedule 1m58.258515463s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:43:08-07:00","message":"lagging behind schedule 1m52.040378353s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:46:26-07:00","message":"lagging behind schedule 146.40094ms"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:46:26-07:00","message":"lagging behind schedule 61.422518ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:46:28-07:00","message":"lagging behind schedule 350.172568ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:55:34-07:00","message":"lagging behind schedule 2m25.536713797s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T05:56:26-07:00","message":"lagging behind schedule 297.89383ms"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:06:26-07:00","message":"lagging behind schedule 416.499005ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:07:30-07:00","message":"lagging behind schedule 1m55.857006194s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:16:26-07:00","message":"lagging behind schedule 155.940721ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:19:21-07:00","message":"lagging behind schedule 1m51.141898511s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:31:19-07:00","message":"lagging behind schedule 1m58.111165053s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:43:20-07:00","message":"lagging behind schedule 2m1.086522548s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:55:22-07:00","message":"lagging behind schedule 2m1.917283385s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T06:56:26-07:00","message":"lagging behind schedule 151.205796ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:07:31-07:00","message":"lagging behind schedule 2m9.103080541s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:19:21-07:00","message":"lagging behind schedule 1m49.639814313s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:31:21-07:00","message":"lagging behind schedule 2m0.603065251s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:43:16-07:00","message":"lagging behind schedule 1m54.513746789s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:46:26-07:00","message":"lagging behind schedule 340.219746ms"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:46:26-07:00","message":"lagging behind schedule 96.965232ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:55:29-07:00","message":"lagging behind schedule 2m12.892218101s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T07:56:27-07:00","message":"lagging behind schedule 77.923446ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T08:07:32-07:00","message":"lagging behind schedule 2m3.065420948s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T08:19:19-07:00","message":"lagging behind schedule 1m47.503173246s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T08:26:27-07:00","message":"lagging behind schedule 379.800237ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T08:31:19-07:00","message":"lagging behind schedule 2m0.113711006s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T08:43:18-07:00","message":"lagging behind schedule 1m58.595001271s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T08:55:30-07:00","message":"lagging behind schedule 2m12.113271673s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:06:27-07:00","message":"lagging behind schedule 370.760259ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:07:29-07:00","message":"lagging behind schedule 1m58.960357827s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:19:24-07:00","message":"lagging behind schedule 1m54.781487926s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:31:21-07:00","message":"lagging behind schedule 1m56.9040141s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:43:19-07:00","message":"lagging behind schedule 1m58.428028421s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:46:27-07:00","message":"lagging behind schedule 63.795901ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T09:55:24-07:00","message":"lagging behind schedule 2m4.712383403s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T10:07:22-07:00","message":"lagging behind schedule 1m58.053733474s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T10:19:21-07:00","message":"lagging behind schedule 1m58.66604585s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T10:31:18-07:00","message":"lagging behind schedule 1m57.497144978s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T10:43:21-07:00","message":"lagging behind schedule 2m2.851380127s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T10:55:27-07:00","message":"lagging behind schedule 2m5.925466979s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T11:06:27-07:00","message":"lagging behind schedule 339.908065ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T11:07:34-07:00","message":"lagging behind schedule 2m6.926451117s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T11:19:19-07:00","message":"lagging behind schedule 1m45.480788666s"} {"level":"warn","Poller":"MYPOLLER","collector":"ZapiPerf:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T11:26:28-07:00","message":"lagging behind schedule 70.690291ms"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T11:31:18-07:00","message":"lagging behind schedule 1m59.04125278s"} {"level":"warn","Poller":"MYPOLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-05-26T11:43:18-07:00","message":"lagging behind schedule 1m59.477889306s"}

jmg011 avatar May 26 '22 23:05 jmg011

@jmg011 Thanks. Could you share full log file for this poller as well.

rahulguptajss avatar May 27 '22 06:05 rahulguptajss

@rahulguptajss : Attached a log file:

poller_log.txt

jmg011 avatar Jun 01 '22 01:06 jmg011

@cgrinds @rahulguptajss I'm seeing another issue. When I was debugging, I observed that Harvest collects qtrees from "quota report" on the filer. But when I run "qtree show" command there are many more qtrees which do not show up on quota report on the filer, and so those are not collected by harvest. Can you confirm that this is the expected behavior?

jmg011 avatar Jun 03 '22 16:06 jmg011

@jmg011

Harvest collects qtree information via qtree-list-iter zapi, along with this detail, Harvest collects quota(only tree quota) related information via quota-report-iter in plugin.

quota details can only be available after the quota creation. So, It is possible that it won't show all qtrees as tree quotas were created on only few qtrees.

example: In my setup, Harvest collects 337 qtree info(all), whereas only 5 qtree has tree quota created on them.

hardikl@hardikl-mac-0 % curl -s http://localhost:12991/metrics | grep qtree_labels | wc -l
     337
hardikl@hardikl-mac-0 % curl -s http://localhost:12991/metrics | grep -i "qtree_disk_used{" | wc -l
       5

hardikl@hardikl-mac-0 % curl -s http://localhost:12991/metrics | grep -i "qtree_disk_used{"       
qtree_disk_used{cluster="ocum-infinity",datacenter="zapi",qtree="s3auto-oz1",volume="fg_oss_1615947890",svm="vs0",export_policy="default",oplocks="enabled",security_style="mixed",status="normal"} 0
qtree_disk_used{cluster="ocum-infinity",datacenter="zapi",qtree="fg_oss_1615947890",volume="fg_oss_1615947890",svm="vs0",export_policy="default",oplocks="enabled",security_style="mixed",status="normal"} 0
qtree_disk_used{cluster="ocum-infinity",datacenter="zapi",qtree="test123",volume="test123",svm="test",export_policy="default",oplocks="enabled",security_style="unix",status="normal"} 0
qtree_disk_used{cluster="ocum-infinity",datacenter="zapi",qtree="qtree1",volume="test123",svm="test",export_policy="default",oplocks="enabled",security_style="unix",status="normal"} 0
qtree_disk_used{cluster="ocum-infinity",datacenter="zapi",qtree="qt1",volume="test123",svm="test",export_policy="default",oplocks="enabled",security_style="unix",status="normal"} 0
hardikl@hardikl-mac-0 % 

Hardikl avatar Jun 06 '22 09:06 Hardikl

@Hardikl - Thank you for clarifying my doubt and providing the example.

@cgrinds @rahulguptajss Awaiting response on the log file I shared, when you get a chance.

jmg011 avatar Jun 06 '22 16:06 jmg011

hi @jmg011 We took a look at your poller logs, thanks! There is some lagging in there, but it's not obvious what's going on with the qtree plugin. We're going to add more logging around this and provide a nightly build if you're interested in using it to gather more info.

cgrinds avatar Jun 06 '22 18:06 cgrinds

@cgrinds Sure, I can test few clusters with nightly release. Let me know when it is available.

jmg011 avatar Jun 06 '22 21:06 jmg011

@jmg011 the logging changes will be in the 2022-06-08 nightly around midnight EDT. When you get a chance, please grab the latest nightly build, launch your poller with loglevel=1, and try one of the clusters showing intermittent qtree zeroes. Something like this

bin/harvest start POLLER_NAME --loglevel=1 --collectors Zapi --objects Qtree

cgrinds avatar Jun 07 '22 19:06 cgrinds

@cgrinds Thank you. I will test this tomorrow and report back the findings.

jmg011 avatar Jun 07 '22 21:06 jmg011

thanks @jmg011 - we're experimenting with ways to share log files easier. If you want, you can zip,gzip,etc. your log files for this poller and email them to this no-reply email address setup to receive files.

cgrinds avatar Jun 08 '22 15:06 cgrinds

hi @jmg011 checking in on this one. Is this a problem you're still seeing and if so have you had a chance to try with a nightly, launch your poller with loglevel=1, and try one of the clusters showing intermittent qtree zeroes?

If so, you can send your log files to this no-reply email address setup to receive files.

cgrinds avatar Jun 28 '22 12:06 cgrinds

@cgrinds - yes, I actually tested this. I forgot to report back. Let me find this report and send you on Monday.

jmg011 avatar Jul 16 '22 00:07 jmg011

@cgrinds Thank you for your patience. I ran this again as could not find the old report.

Within 5 minutes there are many files generated for this poller with loglevel 1. I debugged it and sorted out the unique messages for each qtree:

add (disk-limit) instance add (disk-used) instance add (disk-used-pct-disk-limit) instance add (file-limit) instance add (files-used) instance add (soft-disk-limit) instance add (soft-file-limit) instance add (threshold) instance (disk-limit) added value (disk-used) added value (disk-used-pct-disk-limit) added value (file-limit) added value (files-used) added value (soft-disk-limit) added value (soft-file-limit) added value (threshold) added value

Apart from these messages, I see below unique messages for the same poller.

{"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:46:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:46:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"info","Poller":"POLLER","plugin":"Zapi:Qtree","object":"Qtree","numQtrees":24063,"numQuotas":154,"metrics":933824,"apiD":"9.321s","parseD":"838ms","batchSize":"500","caller":"qtree/qtree.go:262","time":"2022-07-18T15:46:09-07:00","message":"Collected"} {"level":"debug","Poller":"POLLER","collector":"Zapi:Qtree","caller":"collector/collector.go:388","time":"2022-07-18T15:46:09-07:00","message":"plugin [Qtree] added (116728) data"} {"level":"debug","Poller":"POLLER","collector":"Zapi:Qtree","caller":"collector/collector.go:407","time":"2022-07-18T15:46:09-07:00","message":"exporting collected (116729) data"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:47:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:47:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:48:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:48:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:49:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:49:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:50:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:50:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:51:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:51:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:52:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:52:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:53:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:53:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:450","time":"2022-07-18T15:54:04-07:00","message":"collector (Zapi:Qtree) status: (0 - up) running"} {"level":"debug","Poller":"POLLER","caller":"./poller.go:472","time":"2022-07-18T15:54:04-07:00","message":"exporter (prom36) status: (0 - up) initialized"} {"level":"info","Poller":"POLLER","caller":"./poller.go:526","time":"2022-07-18T15:54:23-07:00","message":"caught signal [terminated]"} {"level":"info","Poller":"POLLER","caller":"./poller.go:519","time":"2022-07-18T15:54:23-07:00","message":"cleaning up and stopping [pid=29024]"}

In Production, same poller shows 54 hours of lag time

The reason I did not send all the files for this poller is because there are many project names in qtree names and that we would like to not publish.

cat poller.log | grep -i Qtree | grep -v ZapiPerf {"level":"warn","Poller":"POLLER","collector":"Zapi:Qtree","caller":"goharvest2/cmd/poller/collector/collector.go:430","time":"2022-07-17T21:52:50-07:00","message":"lagging behind schedule 54h15m18.115913235s"}

jmg011 avatar Jul 18 '22 23:07 jmg011

that's quite a lag :open_mouth:

thanks for running with nightly and sharing log snippets.

understood on the production names - that's why I setup an email address ([email protected]) that goes to the NetApp Harvest employees only

Looks like this system has

numQtrees:  24,063
metrics:   933,824

It only took 9.321 seconds to collect them from ONTAP and another 838ms to parse them. All of which is fast enough.

Is production still running 22.03.3118-nightly and was the most recently nightly run against the same cluster? If so, seems like the more recent build lags much less than production. Is that correct?

cgrinds avatar Jul 19 '22 12:07 cgrinds

@cgrinds - Thank you.

Yes, I captured logs on the same cluster from Dev environment running nightly and Production environment running latest major release.

The logs captured on Harvest running below version in Dev: harvest version 22.06.16-nightly (commit e51ec2b) (build date 2022-06-16T00:22:08-0400) linux/amd64

The lag time captured on Harvest from Production is running below version: harvest version 22.05.0-1 (commit 2bc2942) (build date 2022-05-11T07:57:16-0400) linux/amd64

jmg011 avatar Jul 19 '22 17:07 jmg011

@jmg011 Could you send us the full logs to email address ([email protected]) that goes to the NetApp Harvest employees only

rahulguptajss avatar Aug 09 '22 13:08 rahulguptajss

@jmg011 We have made improvements to this area in release 22.08 . We have also added support for user,group quotas as mentioned here

rahulguptajss avatar Aug 22 '22 06:08 rahulguptajss

Closing as fixed. Post 22.08, we improved how Harvest handles ONTAP zeros and non-monotonically increasing counters. These changes will be included in 22.11 and have been in nightly since commit 5c2031a299519eb806a053ec10c0a56161c016cb.

Please reopen if you hit this after upgrading to nightly or 22.11.

cgrinds avatar Oct 05 '22 18:10 cgrinds