gluster-prometheus icon indicating copy to clipboard operation
gluster-prometheus copied to clipboard

volume profile measurement

Open onnorom opened this issue 5 years ago • 6 comments

Hi, I am wondering if there's anything i may be missing that need to be enabled in order for me to get gluster_volume_profile_* measurements? I have set the necessary collectors in the /etc/gluster-exporter/gluster-exporter.toml files as below:

[collectors.gluster_volume_profile] name = "gluster_volume_profile" sync-interval = 5 disabled = false

[collectors.gluster_volume_counts] name = "gluster_volume_counts" sync-interval = 5 disabled = false

[collectors.gluster_volume_heal] name = "gluster_volume_heal" sync-interval = 5 disabled = false

However, i don't see any measurement collected with that name.

I do see the following measurements:

gluster_brick_capacity_bytes_total gluster_brick_capacity_free_bytes gluster_brick_capacity_used_bytes gluster_brick_inodes_free gluster_brick_inodes_total gluster_brick_inodes_used gluster_brick_lv_metadata_percent gluster_brick_lv_metadata_size_bytes gluster_brick_lv_percent gluster_brick_lv_size_bytes gluster_brick_up gluster_cpu_percentage gluster_elapsed_time_seconds gluster_memory_percentage gluster_process:gluster_cpu_percentage:avg1h gluster_process:gluster_elapsed_time_seconds:rate5m gluster_process:gluster_memory_percentage:avg1h gluster_resident_memory_bytes gluster_subvol_capacity_total_bytes gluster_subvol_capacity_used_bytes gluster_vg_extent_alloc_count gluster_vg_extent_total_count gluster_virtual_memory_bytes gluster_volume_heal_count gluster_volume_split_brain_heal_count

onnorom avatar Mar 09 '19 02:03 onnorom

I'm having the same issue.

Profiling is enabled on all volumes and collectors are set for gluster-exporter but still no profile metrics.

$ glusterd -V
glusterfs 4.1.8

khalid151 avatar May 04 '19 11:05 khalid151

I've had an issue with a similar symptom : #151

To check if it's the same root cause, could you please try to :

  • Identify your "leader" node (the one with the highest peer UID in gluster pool list)
  • Set log-level = "debug" in your gluster-exporter.toml
  • Restart the exporter
  • Look at the exporter logs

If you're seeing logs like level=debug msg="Error getting profile info" error="exit status 1" volume=[volume_name], it's probably the same issue.

Neraud avatar May 22 '19 21:05 Neraud

Thanks! That was the same issue and it's fixed now.

khalid151 avatar May 23 '19 09:05 khalid151

first start monitoring workload

Start Profiling
You must start the Profiling to view the File Operation information for each brick.

To start profiling, use following command:

# gluster volume profile start

For example, to start profiling on test-volume:


# gluster volume profile test-volume start
Profiling started on test-volume
When profiling on the volume is started, the following additional options are displayed in the Volume Info:


diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

enable "gluster_volume_profile" in config see config template https://github.com/gluster/gluster-prometheus/blob/master/extras/conf/gluster-exporter.toml.sample

[collectors.gluster_volume_profile]
name = "gluster_volume_profile"
sync-interval = 5
disabled = false

second deploy on all nodes

Because only isleader of the nodes is selected to collect data。finds the peer with the maximum UUID (lexicographically) is leader.

like this peer list,only get Profiling data in the maximum UUID(b2157fd6-4d7a-485e-b21d-1c3785ab3fbd),because it is leader

[root]# gluster pool list
UUID					Hostname  	State
91acc359-eee7-4faf-b47b-692351bd3fd9	192.63.1.19 	Connected 
b2157fd6-4d7a-485e-b21d-1c3785ab3fbd	192.63.1.18 	Connected 
5a8c3b1f-21e2-4657-baf7-48fe272fcbfc	192.63.1.110	Connected 
57f9c5fa-2dfa-4fc7-912c-619cfb047170	192.63.1.16 	Connected 
a0a13141-b402-46ca-97a2-5d3703283626	10.63.1.17 	Connected 
13b99272-b7e4-4aee-b3bf-ec8d456c04e8	localhost 	Connected 

in the code https://github.com/gluster/gluster-prometheus/blob/master/pkg/glusterutils/exporterd.go

// IsLeader returns true or false based on whether the node is the leader of the cluster or not
func (g *GD1) IsLeader() (bool, error) {
	setDefaultConfig(g.config)
	peerList, err := g.Peers()
	if err != nil {
		return false, err
	}
	peerID, err := g.LocalPeerID()
	if err != nil {
		return false, err
	}
	var maxPeerID string
	//This for loop iterates among all the peers and finds the peer with the maximum UUID (lexicographically)
	for i, pr := range peerList {
		if pr.Online {
			if peerList[i].ID > maxPeerID {
				maxPeerID = peerList[i].ID
			}
		}
	}
	//Checks and returns true if maximum peerID is equal to the local peerID
	if maxPeerID == peerID {
		return true, nil
	}
	return false, nil
}

limiao2008 avatar Dec 11 '20 06:12 limiao2008

@khalid151 @Neraud @onnorom @csabahenk There's no problem, deploy on all nodes can get volume profile https://github.com/gluster/gluster-prometheus/issues/147#issuecomment-743010344

limiao2008 avatar Dec 11 '20 06:12 limiao2008

gluster pool list hello I followed your method and did not solve the problem,

Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
auth.allow: *
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

gluster-exporter.toml

[globals]
gluster-cluster-id = ""
gluster-mgmt = "glusterd"
glusterd-dir = "/var/lib/glusterd"
gluster-binary-path = "gluster"
# If you want to connect to a remote gd1 host, set the variable gd1-remote-host
# However, using a remote host restrict the gluster cli to read-only commands
# The following collectors won't work in remote mode : gluster_volume_counts, gluster_volume_profile 
#gd1-remote-host = "localhost"
gd2-rest-endpoint = "http://localhost:24007"
port = 9713
metrics-path = "/metrics"
log-dir = "/var/log/gluster-exporter"
log-file = "exporter.log"
log-level = "info"
# cache-ttl-in-sec = 0, disables caching
cache-ttl-in-sec = 30
# by default caching is turned off
# to enable caching, add the function-name to 'cache-enabled-funcs' list
# supported functions are,
# 'IsLeader', 'LocalPeerID', 'VolumeInfo'
# 'EnableVolumeProfiling', 'HealInfo', 'Peers',
# 'Snapshots', 'VolumeBrickStatus', 'VolumeProfileInfo'
cache-enabled-funcs = [ 'IsLeader', 'LocalPeerID', 'VolumeInfo' ]

[collectors.gluster_ps]
name = "gluster_ps"
sync-interval = 5
disabled = false

[collectors.gluster_peer_counts]
name = "gluster_peer_counts"
sync-interval = 5
disabled = false

[collectors.gluster_peer_info]
name = "gluster_peer_info"
sync-interval = 5
disabled = false

[collectors.gluster_brick]
name = "gluster_brick"
sync-interval = 5
disabled = false

[collectors.gluster_brick_status]
name = "gluster_brick_status"
sync-interval = 15
disabled = false

[collectors.gluster_volume_counts]
name = "gluster_volume_counts"
sync-interval = 5
disabled = false

[collectors.gluster_volume_status]
name = "gluster_volume_status"
sync-interval = 5
disabled = false

[collectors.gluster_volume_heal]
name = "gluster_volume_heal"
sync-interval = 5
disabled = false

[collectors.gluster_volume_profile]
name = "gluster_volume_profile"
sync-interval = 5
disabled = false

i don't have gluster_thinpool_metadata_* @limiao2008

vast0906 avatar Dec 03 '21 03:12 vast0906