Enable metric tags using volume comment for zapiperf objects.
Is your feature request related to a problem? Please describe. Recently impliemented metric tagging using volume comment (for capacity metrics), however, unable to do the same for performance metrics. I need to display perf metrics leveraging the K:V pairs within volume comments.
Describe the solution you'd like Either a similar implementation (for Harvest2 poller configuration) as currently possible for capacity metrics -or- alternative solution (see below)
Describe alternatives you've considered Chris Grindstaff raised an idea of a custom plugin (volume-tagger) if this idea can be proven out, then would like documentation on creating such a custom plugin.
Additional context
https://netapppub.slack.com/archives/C02072M1UCD/p1639154984401500?thread_ts=1638986222.380400&cid=C02072M1UCD
Curious if there are any updates for this?
@chadpruden We'll discuss this for our next release and update here. Thanks for the follow up.
@chadpruden I have found a way to merge comment information to zapiperf counters and it doesn't require any plugin. Below are the steps.
1: Modify volume.yaml zapi template as below. Adds comment and exports comment and instance_uuid to existing template.
name: Volume
query: volume-get-iter
object: volume
# increase client timeout for volumes
client_timeout: 2m
counters:
volume-attributes:
- volume-autosize-attributes:
- maximum-size
- grow-threshold-percent
- volume-id-attributes:
- ^^instance-uuid => instance_uuid
- ^name => volume
- ^node => node
- ^owning-vserver-name => svm
- ^containing-aggregate-name => aggr
- ^containing-aggregate-uuid => aggrUuid
- ^style-extended => style
- ^type => type
- ^comment => comment
- volume-inode-attributes:
- files-used
- files-total
- volume-sis-attributes:
- compression-space-saved => sis_compress_saved
- deduplication-space-saved => sis_dedup_saved
- total-space-saved => sis_total_saved
- percentage-compression-space-saved => sis_compress_saved_percent
- percentage-deduplication-space-saved => sis_dedup_saved_percent
- percentage-total-space-saved => sis_total_saved_percent
- ^is-sis-volume => is_sis_volume
- volume-space-attributes:
- expected-available
- filesystem-size => filesystem_size
- logical-available
- logical-used
- logical-used-by-afs
- logical-used-by-snapshots
- logical-used-percent
- physical-used
- physical-used-percent
- size => size
- size-available => size_available
- size-total => size_total
- size-used => size_used
- percentage-size-used => size_used_percent
- size-used-by-snapshots => snapshots_size_used
- size-available-for-snapshots => snapshots_size_available
- snapshot-reserve-available => snapshot_reserve_available
- snapshot-reserve-size => snapshot_reserve_size
- percentage-snapshot-reserve => snapshot_reserve_percent
- percentage-snapshot-reserve-used => snapshot_reserve_used_percent
- volume-state-attributes:
- ^state
- ^status
- volume-snapshot-attributes:
- ^auto-snapshots-enabled => auto_snapshots_enabled
- ^snapshot-policy
- snapshot-count
- ^encrypt => isEncrypted
plugins:
Volume:
schedule:
- data: 900s # should be multiple of data poll duration
#batch_size: "50"
LabelAgent:
# metric label zapi_value rest_value `default_value`
value_to_num:
- new_status state online online `0`
exclude_equals:
- style `flexgroup_constituent`
# To prevent visibility of transient volumes, uncomment the following lines
# exclude_regex:
# # Exclude SnapProtect/CommVault Intellisnap, Clone volumes have a “_CVclone” suffix
# - volume `.+_CVclone`
# # Exclude SnapCenter, Clone volumes have a “DDMMYYhhmmss” suffix
# - volume `.+(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])\d\d[0-9]{6}`
# # Exclude manually created SnapCreator clones, Clone volumes have a “cl_” prefix and a “_YYYYMMDDhhmmss” suffix
# - volume `cl_.+_(19|20)\d\d(0[1-9]|1[012])( 0[1-9]|[12][0-9]|3[01])[0-9]{6}`
# # Exclude SnapDrive/SnapManager, Clone volumes have a “sdw_cl_” prefix
# - volume `sdw_cl_.+`
# # Exclude Metadata volumes, CRS volumes in SVM-DR or MetroCluster have a “MDV_CRS_” prefix
# - volume `MDV_CRS_.+`
# # Exclude Metadata volumes, Audit volumes have a “MDV_aud_” prefix
# - volume `MDV_aud_.+`
replace:
- style style `flexgroup_constituent` `flexgroup`
Aggregator:
- volume<style=flexgroup>volume node,svm,aggr,style
export_options:
instance_keys:
- volume
- node
- svm
- aggr
- style
instance_labels:
- state
- is_sis_volume
- snapshot_policy
- type
- protectedByStatus
- protectedBy
- protectionRole
- all_sm_healthy
- isEncrypted
- isHardwareEncrypted
- comment
- instance_uuid
2: Modify volume.yaml zapiperf template as below. Added instance_uuid in exports.
name: Volume
query: volume
object: volume
instance_key: uuid
counters:
- instance_uuid
- instance_name
- vserver_name => svm
- node_name => node
- parent_aggr => aggr
- read_data
- write_data
- read_ops
- write_ops
- other_ops
- total_ops
- read_latency
- write_latency
- other_latency
- avg_latency
plugins:
- Volume
# - LabelAgent:
# # To prevent visibility of transient volumes, uncomment the following lines
# exclude_regex:
# # Exclude SnapProtect/CommVault Intellisnap, Clone volumes have a “_CVclone” suffix
# - volume `.+_CVclone`
# # Exclude SnapCenter, Clone volumes have a “DDMMYYhhmmss” suffix
# - volume `.+(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])\d\d[0-9]{6}`
# # Exclude manually created SnapCreator clones, Clone volumes have a “cl_” prefix and a “_YYYYMMDDhhmmss” suffix
# - volume `cl_.+_(19|20)\d\d(0[1-9]|1[012])( 0[1-9]|[12][0-9]|3[01])[0-9]{6}`
# # Exclude SnapDrive/SnapManager, Clone volumes have a “sdw_cl_” prefix
# - volume `sdw_cl_.+`
# # Exclude Metadata volumes, CRS volumes in SVM-DR or MetroCluster have a “MDV_CRS_” prefix
# - volume `MDV_CRS_.+`
# # Exclude Metadata volumes, Audit volumes have a “MDV_aud_” prefix
# - volume `MDV_aud_.+`
export_options:
instance_keys:
- volume
- node
- svm
- aggr
- style
- instance_uuid
3: Restart harvest pollers and run below query in prometheus after at least 5 mins of poller start. This will merge comment data from volume_labels to volume_read_data zapiperf metric. You'll need to do the same for other volume metric.
volume_read_data * on(instance_uuid) group_left(comment) volume_labels
Let us know if that works.
@chadpruden Let us know if above solution helps.
@rahulguptajss We are currently working through implementing the solution you have outlined for us above, however we are stuck on this section as we use InfluxDB and not Prometheus:
`3: Restart harvest pollers and run below query in prometheus after at least 5 mins of poller start. This will merge comment data from volume_labels to volume_read_data zapiperf metric. You'll need to do the same for other volume metric.
volume_read_data * on(instance_uuid) group_left(comment) volume_labels`
Can you reference if there is an InfluxDB equivalent to this command that will achieve the same result, or is this not entirely possible / applicable with Influx?
Also, we are curious how would your above write-up / recommendations be different if we were not using the out-of-the-box *.yaml collectors? For example, we would like to implement your solution using the below as a custom collector object:
name: Harvest_Custom_Volume query: volume object: harvest_custom_volume
We are hesitant to modify the standard collectors as we rely on them for (many) distributed dashboards, so we would like to sandbox / test this first using custom collectors before we implement this more broadly.
Thanks much for your help and support, Rahul
-Scott
hi Scott - Yes, those changes can be made in separate template files so the out-the-box-ones are not touched. I've outlined the steps below.
In regards to InfluDB, what Rahul pasted above is a Prometheus join on volume_read_data and volume_labels. InfluxDB 2 supports joins too, although it's been awhile since I've used them. I'll see if I can dig something up.
New templates for Volume_With_Tags
Summary
We're going to create two custom.yaml files, one for the Zapi collector and another for the ZapiPerf collector. Those two custom.yaml files will include the templates Rahul shared above.
- Create
conf/zapi/custom.yaml - Create
conf/zapi/cdot/9.8.0/volume_with_tag.yaml - Create
conf/zapiperf/custom.yaml - Create
conf/zapiperf/cdot/9.8.0/volume_with_tag.yaml
Details
If you cd to your Harvest install directory, you can copy/paste the following code sections to create the files.
Create conf/zapi/custom.yaml
echo '
objects:
VolWithTag: volume_with_tag.yaml
' > conf/zapi/custom.yaml
Create conf/zapi/cdot/9.8.0/volume_with_tag.yaml
echo '
name: Volume
query: volume-get-iter
object: volwithtag
# increase client timeout for volumes
client_timeout: 2m
counters:
volume-attributes:
- volume-autosize-attributes:
- maximum-size
- grow-threshold-percent
- volume-id-attributes:
- ^^instance-uuid => instance_uuid
- ^name => volume
- ^node => node
- ^owning-vserver-name => svm
- ^containing-aggregate-name => aggr
- ^containing-aggregate-uuid => aggrUuid
- ^style-extended => style
- ^type => type
- ^comment => comment
- volume-inode-attributes:
- files-used
- files-total
- volume-sis-attributes:
- compression-space-saved => sis_compress_saved
- deduplication-space-saved => sis_dedup_saved
- total-space-saved => sis_total_saved
- percentage-compression-space-saved => sis_compress_saved_percent
- percentage-deduplication-space-saved => sis_dedup_saved_percent
- percentage-total-space-saved => sis_total_saved_percent
- ^is-sis-volume => is_sis_volume
- volume-space-attributes:
- expected-available
- filesystem-size => filesystem_size
- logical-available
- logical-used
- logical-used-by-afs
- logical-used-by-snapshots
- logical-used-percent
- physical-used
- physical-used-percent
- size => size
- size-available => size_available
- size-total => size_total
- size-used => size_used
- percentage-size-used => size_used_percent
- size-used-by-snapshots => snapshots_size_used
- size-available-for-snapshots => snapshots_size_available
- snapshot-reserve-available => snapshot_reserve_available
- snapshot-reserve-size => snapshot_reserve_size
- percentage-snapshot-reserve => snapshot_reserve_percent
- percentage-snapshot-reserve-used => snapshot_reserve_used_percent
- volume-state-attributes:
- ^state
- ^status
- volume-snapshot-attributes:
- ^auto-snapshots-enabled => auto_snapshots_enabled
- ^snapshot-policy
- snapshot-count
- ^encrypt => isEncrypted
plugins:
Volume:
schedule:
- data: 900s # should be multiple of data poll duration
#batch_size: "50"
LabelAgent:
# metric label zapi_value rest_value `default_value`
value_to_num:
- new_status state online online `0`
exclude_equals:
- style `flexgroup_constituent`
# To prevent visibility of transient volumes, uncomment the following lines
# exclude_regex:
# # Exclude SnapProtect/CommVault Intellisnap, Clone volumes have a “_CVclone” suffix
# - volume `.+_CVclone`
# # Exclude SnapCenter, Clone volumes have a “DDMMYYhhmmss” suffix
# - volume `.+(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])\d\d[0-9]{6}`
# # Exclude manually created SnapCreator clones, Clone volumes have a “cl_” prefix and a “_YYYYMMDDhhmmss” suffix
# - volume `cl_.+_(19|20)\d\d(0[1-9]|1[012])( 0[1-9]|[12][0-9]|3[01])[0-9]{6}`
# # Exclude SnapDrive/SnapManager, Clone volumes have a “sdw_cl_” prefix
# - volume `sdw_cl_.+`
# # Exclude Metadata volumes, CRS volumes in SVM-DR or MetroCluster have a “MDV_CRS_” prefix
# - volume `MDV_CRS_.+`
# # Exclude Metadata volumes, Audit volumes have a “MDV_aud_” prefix
# - volume `MDV_aud_.+`
replace:
- style style `flexgroup_constituent` `flexgroup`
Aggregator:
- volume<style=flexgroup>volume node,svm,aggr,style
export_options:
instance_keys:
- volume
- node
- svm
- aggr
- style
instance_labels:
- state
- is_sis_volume
- snapshot_policy
- type
- protectedByStatus
- protectedBy
- protectionRole
- all_sm_healthy
- isEncrypted
- isHardwareEncrypted
- comment
- instance_uuid
' > conf/zapi/cdot/9.8.0/volume_with_tag.yaml
Create conf/zapiperf/custom.yaml
echo '
objects:
VolWithTag: volume_with_tag.yaml
' > conf/zapiperf/custom.yaml
Create conf/zapiperf/cdot/9.8.0/volume_with_tag.yaml
echo '
name: Volume
query: volume
object: volwithtag
instance_key: uuid
counters:
- instance_uuid
- instance_name => volume
- vserver_name => svm
- node_name => node
- parent_aggr => aggr
- read_data
- write_data
- read_ops
- write_ops
- other_ops
- total_ops
- read_latency
- write_latency
- other_latency
- avg_latency
plugins:
- Volume
# - LabelAgent:
# # To prevent visibility of transient volumes, uncomment the following lines
# exclude_regex:
# # Exclude SnapProtect/CommVault Intellisnap, Clone volumes have a “_CVclone” suffix
# - volume `.+_CVclone`
# # Exclude SnapCenter, Clone volumes have a “DDMMYYhhmmss” suffix
# - volume `.+(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])\d\d[0-9]{6}`
# # Exclude manually created SnapCreator clones, Clone volumes have a “cl_” prefix and a “_YYYYMMDDhhmmss” suffix
# - volume `cl_.+_(19|20)\d\d(0[1-9]|1[012])( 0[1-9]|[12][0-9]|3[01])[0-9]{6}`
# # Exclude SnapDrive/SnapManager, Clone volumes have a “sdw_cl_” prefix
# - volume `sdw_cl_.+`
# # Exclude Metadata volumes, CRS volumes in SVM-DR or MetroCluster have a “MDV_CRS_” prefix
# - volume `MDV_CRS_.+`
# # Exclude Metadata volumes, Audit volumes have a “MDV_aud_” prefix
# - volume `MDV_aud_.+`
export_options:
instance_keys:
- volume
- node
- svm
- aggr
- style
- instance_uuid
' > conf/zapiperf/cdot/9.8.0/volume_with_tag.yaml
You can test from the command line, like so, to verify everything's working. Change $poller to match your poller.
bin/poller --poller $poller --objects VolWithTag
Logs showing the custom templates are being used
2022-10-05T08:37:40-04:00 INF collector/helpers.go:134 > best-fit template Poller=u2 collector=Zapi:VolWithTag path=conf/zapi/cdot/9.8.0/volume_with_tag.yaml v=9.9.1
2022-10-05T08:37:41-04:00 INF collector/helpers.go:134 > best-fit template Poller=u2 collector=ZapiPerf:VolWithTag path=conf/zapiperf/cdot/9.8.0/volume_with_tag.yaml v=9.9.1
@cgrinds Would this custom collector for volume_with_tags be publishing to same measurement-name (object: volume) as the out-of-box collector? Thus causing double-counting (two parallel collections) of our flexvols into Influx?
@chadpruden good catch, yes it would. I updated the example and changed the object name to volwithtag. Any name will work.
@cgrinds Thank you for all your assistance with this Chris, much appreciated.
We have followed your write-up and believe we are close to having it working, but are stuck on one aspect we are hoping you can shed some light on. The poller sees our custom object ("TAPIVolume") and we see this custom tag being picked up in Grafana, but we are not seeing any metrics flowing, either in the logs or in Grafana.
Noticed this in the logs, wondering if either of these errors are indicative of a known / common misconfiguration, or if there is something else you recommend us looking at to advance our troubleshooting on this.
{"level":"error","Poller":"fas-xxxxxx","collector":"ZapiPerf:TAPIVolume","stack":[{"func":"New","line":"35","source":"errors.go"},{"func":"(*Client).invoke","line":"366","source":"client.go"},{"func":"(*Client).InvokeBatchWithTimers","line":"285","source":"client.go"},{"func":"(*Client).InvokeBatchRequest","line":"258","source":"client.go"},{"func":"(*ZapiPerf).PollInstance","line":"1169","source":"zapiperf.go"},{"func":"(*task).Run","line":"60","source":"schedule.go"},{"func":"(*AbstractCollector).Start","line":"269","source":"collector.go"},{"func":"goexit","line":"1371","source":"asm_amd64.s"}],"error":"connection error => Post "https://fas-xxxxxx.domain.com:443/servlets/netapp.servlets.admin.XMLrequest_filer": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","caller":"goharvest2/cmd/collectors/zapiperf/zapiperf.go:1170","time":"2022-10-05T15:57:58-05:00","message":"instance request"}
{"level":"info","Poller":"fas-xxxxxx","collector":"ZapiPerf:TAPIVolume","caller":"goharvest2/cmd/poller/collector/collector.go:295","time":"2022-10-05T15:57:58-05:00","message":"no [TAPIVolume] instances on system, entering standby mode"}
Kind regards,
-Scott
@electrocreative There is an timeout error as per the logs. Default timeout for ZapiPerf collector is 10s, You can increase this by adding client_timeout to ZapiPerf:TAPIVolume template as mentioned here. Let's try 30s, this time should ideally be less than the polling frequency of the collector which is by default 1m for ZapiPerf.
@electrocreative I updated the conf/zapiperf/cdot/9.8.0/volume_with_tag.yaml template shared yesterday with a one line change at line 10.
Replace this
- instance_name
with this
- instance_name => volume
Not sure if you sorted out the Flux join query or not, but @rahulguptajss and I looked at it today and managed to get this working. Not sure if exactly fits your case, but sharing in case it helps.
import "join"
import "influxdata/influxdb/schema"
left = from(bucket: "harvest")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "volwithtag")
|> filter(fn: (r) => r["_field"] == "read_latency")
|> schema.fieldsAsCols()
right = from(bucket: "harvest")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "volwithtag")
|> filter(fn: (r) => r["_field"] == "comment")
|> schema.fieldsAsCols()
join.right(
left: left,
right: right,
on: (l, r) => l.instance_uuid == r.instance_uuid,
as: (l, r) => ({l with comment: r.comment}),
)
verified in 22.11. @chadpruden Let us know the feedback.
To use the plugin, you need to enable VolumeTag plugin. Example below
name: Volume
query: volume
object: volume
instance_key: uuid
counters:
- instance_uuid
- instance_name
- vserver_name => svm
- node_name => node
- parent_aggr => aggr
- read_data
- write_data
- read_ops
- write_ops
- other_ops
- total_ops
- read_latency
- write_latency
- other_latency
- avg_latency
plugins:
- Volume
- VolumeTag
- Aggregator:
- node
export_options:
instance_keys:
- volume
- node
- svm
- aggr
- style