bee icon indicating copy to clipboard operation
bee copied to clipboard

Prevent users from staking on machines that cannot calculate proofs in time

Open crtahlin opened this issue 1 year ago • 5 comments

Summary

Some machines are too slow to calculate proofs required for storage incentive lottery, in time. In that case, although they are properly staking and might have all the data, they are failing to participate fruitfully in the lottery.

There is a way to check the performance of the machines, as described in docs with rchash. But user's might not know they can/should do that. Either they miss it in the docs, or they are using some other method of installation (e.g. DAppNode).

The proposed new feature would be to required the users to run at least one rchash calculation on a fully synced node as a prerequisite, before being allowed to stake. Which could be overridden by the users, allowing them to stake even if thi prerequisite is not met.

Motivation

A user might run a full staking Bee node, investing time and resources (stake, electricity, hardware, time) and not winning anything, while not even knowing why - that their resources might be insufficient (CPU speed, storage speed).

Implementation

A full Bee node already knows what the result of last rchash calculation was - how long it took to calculate: image

The same, or another field, could be used to store the duration of a rchash calculation. Whichever field is used, its value should be retrieved when a staking call is executed, and if the duration is too long, the staking should fail with an informative message, that the machine would probably not be able to play the lottery. If there is no information in the field, the staking should fail, with an informative message, instructing the user to run rchash.

The user can call the staking endpoint with a special argument, overriding the above mentioned check of rchash duration.

Also, when running rchash call, the returned information should also include a user friendly field, explicitly stating that the calculation was done in required time needed to play in the lottery (perhaps this specific field could be referenced by the logic allowing staking).

Drawbacks

It could affect the onboarding UX negatively, esp. if done improperly - if stopping the user in the onboarding process without proper explanation.

crtahlin avatar Oct 04 '24 13:10 crtahlin

yes, I missed it as it's just an endpoint that wasn't of particular interest at the time. I only realised that yesterday, when it was brought up that nodes, could miss a lottery. should be in docs with some highest warning on staking part. @NoahMaizels

think it would be also useful to make this a more rigorous feature, that nodes are tested periodically and output is saved to node and displayed on "status" or something, with average and latest, worst time etc. Why? because from my example I run it on server with different things on it, and it affects performances, the more other things I install the less resources is for bee, so i might start with enough but could end up with not enough. think some kind of monitoring of this would be good way for node operators to have things in check.

0xCardiE avatar Oct 04 '24 19:10 0xCardiE

I would vote against periodically running this rchash check in any internal fashion. If you've ever watched the host performance when a node is calculating the hash, it's a very noticeable load on both the CPU and storage subsystems (read: SSD). If two co-resident nodes in different neighborhoods happen to do an rchash check at the same time, they'll both be longer than they should be, possibly to the point of reporting themselves as "bad" or whatever the intent is for too-long rchash times.

Outside-the-swarm variable host load is one thing, but intentionally adding such a load (random rchash calculation) on every single swarm node is unnecessary, gains nothing, and wastes resources IMHO.

ldeffenb avatar Oct 04 '24 20:10 ldeffenb

yes, didnt consider those situations. well maybe you are right and my suggerstions is overkill. but we should make it a strong warning for all node operators to do checks on this regularly and that they must be aware of this as part of the "node health"

0xCardiE avatar Oct 07 '24 16:10 0xCardiE

also make human readable outout of duration currently its this nanoseconds

image

0xCardiE avatar Oct 07 '24 16:10 0xCardiE