solana icon indicating copy to clipboard operation
solana copied to clipboard

Add vote instruction count metrics

Open HaoranYi opened this issue 1 year ago • 9 comments

Problem

In https://github.com/solana-labs/solana/pull/24696, we removed the processed vote instruction counters because of spamming.

However, it is useful to know how many vote instructions per second are processed by the validator.

Summary of Changes

Add vote instruction counter metrics.

Fixes #

HaoranYi avatar Jan 30 '24 17:01 HaoranYi

image

We are processing about 3K vote programs per second.

HaoranYi avatar Jan 30 '24 22:01 HaoranYi

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (b1f8a89) 81.6% compared to head (cc1030e) 81.6%. Report is 108 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #35015   +/-   ##
=======================================
  Coverage    81.6%    81.6%           
=======================================
  Files         830      830           
  Lines      224841   224854   +13     
=======================================
+ Hits       183486   183524   +38     
+ Misses      41355    41330   -25     

codecov[bot] avatar Jan 30 '24 23:01 codecov[bot]

We may port this to BPF in the future, will this strategy work in a world where the program is a standard bpf program?

sakridge avatar Feb 01 '24 13:02 sakridge

lazy_static is not ideal also, but I can see the benefit of accumulating the counters closer to where we do the actual vote processing

sakridge avatar Feb 01 '24 13:02 sakridge

We may port this to BPF in the future, will this strategy work in a world where the program is a standard bpf program?

No, it won't work with a standard bpf program. But it will work for any other builtin programs.

For standard bfp program, we could collect and report the matrics at a higher level in the call stack here.

HaoranYi avatar Feb 01 '24 16:02 HaoranYi

lazy_static is not ideal also, but I can see the benefit of accumulating the counters closer to where we do the actual vote processing

Yes. Not ideal. I had thought about to attach the metrics to invoke_context. But it will affect all program execution. So I decided to keep it local and use lazy_static.

HaoranYi avatar Feb 01 '24 16:02 HaoranYi

lazy_static is not ideal also, but I can see the benefit of accumulating the counters closer to where we do the actual vote processing

What are the objections to lazy_static here?

jeffwashington avatar Feb 01 '24 18:02 jeffwashington

lazy_static is not ideal also, but I can see the benefit of accumulating the counters closer to where we do the actual vote processing

What are the objections to lazy_static here?

lazy_static is like global variable. And global varaible make code smells.

HaoranYi avatar Feb 01 '24 20:02 HaoranYi

I'm not familiar with these metrics/how they are used, so I've removed myself as a reviewer. I defer to the subject matter experts here.

brooksprumo avatar Feb 02 '24 16:02 brooksprumo

I'm fine with the lazy_static, just wanted to make sure we considered all the options since as haoran said it's like a global variable and we should avoid it if possible. And it could give incorrect metrics within the same process if you had multiple validators running.

sakridge avatar Feb 23 '24 18:02 sakridge

Yes, it won't give you vote instruction metrics for individual validators. But it can provide the aggregated vote metric all the validators. The main usage of this metric is to monitor validator running on mainnet or testnet, which is unlikely to have multiple validators running in the same process. Do you think it is worth merging into master? @sakridge @jeffwashington @Lichtso

HaoranYi avatar Feb 23 '24 20:02 HaoranYi

Yes, it won't give you vote instruction metrics for individual validators. But it can provide the aggregated vote metric all the validators.

I think it will 'give you vote instruction metrics for individual validators', as long as you aren't running multiple validators concurrently in the same process, right?

jeffwashington avatar Feb 23 '24 20:02 jeffwashington

@behzadnouri had mentioned these metrics would be useful to restore, I believe.

jeffwashington avatar Feb 23 '24 20:02 jeffwashington

This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

willhickey avatar Mar 03 '24 04:03 willhickey

Moved to agave #90

HaoranYi avatar Mar 05 '24 15:03 HaoranYi