solana icon indicating copy to clipboard operation
solana copied to clipboard

allow staked nodes weight override

Open janlegner opened this issue 2 years ago • 6 comments

Problem

As per issue #26407: The validator may want to prioritize QUIC traffic from a known whitelist of sources and apply its own stake to those transactions.

Summary of Changes

  • New parameter is added to the solana-validator program: --staked_nodes_overrides <PATH/URL>
  • A configuration file is periodically fetched from the given location (filesystem or HTTP)
  • The fetched configuration is applied to the staked nodes weight - effectively allowing whitelisting/prioritizing traffic from specified sources.

Config example

auto_reload: true
stake_map:
  1.1.1.1: 1000000000000000
  2.2.2.2: 4000000000000000

janlegner avatar Aug 01 '22 15:08 janlegner

It might be more useful to have overrides for pubkey -> stake mapping (instead of IP address -> stake). Most known connections will be from peer nodes that have active stake (e.g. previous leaders, or soon to be leaders). If the peer is staked, it's identity pubkey is being used to lookup its stake.

pgarg66 avatar Aug 01 '22 19:08 pgarg66

It might be more useful to have overrides for pubkey -> stake mapping (instead of IP address -> stake). Most known connections will be from peer nodes that have active stake (e.g. previous leaders, or soon to be leaders). If the peer is staked, it's identity pubkey is being used to lookup its stake.

That is a good idea - but what about e.g. RPC nodes without vote pubkey? :thinking: Maybe allowing both IP and pubkey would be a good idea?

EDIT: or identity pubkey?

janlegner avatar Aug 02 '22 14:08 janlegner

I don't like the idea of a new service within the validator that tries to keep this list updated. Instead we only need a couple primitives:

  1. New solana-validator command-line argument to provide the data file
  2. A new admin ipc method to signal a reload of the data file (cc: https://github.com/solana-labs/solana/blob/ec36f0c5dffb36d639bc447a7552edbb8a241aa1/validator/src/admin_rpc_service.rs#L134)

mvines avatar Aug 02 '22 15:08 mvines

I don't like the idea of a new service within the validator that tries to keep this list updated. Instead we only need a couple primitives:

  1. New solana-validator command-line argument to provide the data file
  2. A new admin ipc method to signal a reload of the data file (cc: https://github.com/solana-labs/solana/blob/ec36f0c5dffb36d639bc447a7552edbb8a241aa1/validator/src/admin_rpc_service.rs#L134 )

Thanks for the suggestion, I did not think about using admin signals but that makes a lot of sense. Where do you think we should then move the logic loading the configuration from file/URL? Keeping it in core/src/... seems strange. I do not want to inflate validator/src/main.rs too much, so maybe validator/src/lib.rs?

janlegner avatar Aug 03 '22 07:08 janlegner

I do not want to inflate validator/src/main.rs too much, so maybe validator/src/lib.rs?

Yep that seems great. Putting it in lib.rs will also make it easier to add to solana-test-validator if somebody wants the functionality added there as well one day

mvines avatar Aug 03 '22 15:08 mvines

I did an update behalf of @janlegner on this PR. The yaml configuration is loaded via RPC admin command or as a configuration option at the start of the validator program. The configuration works with the node identities now (/cc @pgarg66) instead of the IPs.

@mvines would you be so kind and provide me a feedback on this PR?

ochaloup avatar Aug 09 '22 15:08 ochaloup

@mvines thank you for the review. I hope I addressed most of the points (details in the comments above).

ochaloup avatar Aug 11 '22 13:08 ochaloup

@mvines @pgarg66 As browsing through the code I would like to check with you one detail.

The goal of this PR is to override the stake amounts (as per pr#26407) to make a validator peer to be considered with some amount of the stake and then from that calculate QUIC streams and packet vote stage. I did it now in way that the mapping is taken from the (https://github.com/solana-labs/solana/blob/v1.11.5/core/src/staked_nodes_updater_service.rs#L75) tvu_peers() and overrde is applied only if the validator identity is valid within the TVU ips. I think that this could be limiting. When the validator happens to be a valid TVU peer? For the QUIC stake overriding purpose of ip/id could I use rather the cluster_info.all_peers()?

Btw. could be two validators running with the same IP? Could be then the ip_to_stake map in clash in values (https://github.com/solana-labs/solana/blob/v1.11.5/core/src/staked_nodes_updater_service.rs#L82)?

ochaloup avatar Aug 11 '22 15:08 ochaloup

@mvines @pgarg66 As browsing through the code I would like to check with you one detail.

The goal of this PR is to override the stake amounts (as per pr#26407) to make a validator peer to be considered with some amount of the stake and then from that calculate QUIC streams and packet vote stage. I did it now in way that the mapping is taken from the (https://github.com/solana-labs/solana/blob/v1.11.5/core/src/staked_nodes_updater_service.rs#L75) tvu_peers() and overrde is applied only if the validator identity is valid within the TVU ips. I think that this could be limiting. When the validator happens to be a valid TVU peer? For the QUIC stake overriding purpose of ip/id could I use rather the cluster_info.all_peers()?

It makes sense to use all_peers(). It'll allow a validator to allocate more QUIC streams to a node that's not a staked validator (e.g. an RPC node, or a client).

Btw. could be two validators running with the same IP? Could be then the ip_to_stake map in clash in values (https://github.com/solana-labs/solana/blob/v1.11.5/core/src/staked_nodes_updater_service.rs#L82)?

It's technically possible for two validators, e.g. behind NAT, using the same IP address. For this reason, we are not using ip_stake_map in QUIC. We have a mechanism to get peer's identity pubkey during QUIC connection setup process, and that's used for looking up id_to_stake (pubkey_stake_map) table.

pgarg66 avatar Aug 11 '22 15:08 pgarg66

It makes sense to use all_peers(). It'll allow a validator to allocate more QUIC streams to a node that's not a staked validator (e.g. an RPC node, or a client).

I see. Thank you @pgarg66 for this great elaboration! I changed the code for all_peers() as suggested (if you can check it would be nice).

ochaloup avatar Aug 11 '22 16:08 ochaloup

It makes sense to use all_peers(). It'll allow a validator to allocate more QUIC streams to a node that's not a staked validator (e.g. an RPC node, or a client).

I see. Thank you @pgarg66 for this great elaboration! I changed the code for all_peers() as suggested (if you can check it would be nice).

Thanks for updating the PR. Looks good to me. Let's wait for @mvines to review it too, before we merge it.

pgarg66 avatar Aug 11 '22 17:08 pgarg66