massa
massa copied to clipboard
Questions Re Factory Worker
- Have you considered merging the two factory threads? I think having two could lead to weird behavior under load(since the timeout doesn't guarantee wake-up on time).
- Why don't you wait on the condvar for the draw cache? It seems that waking up before the draw has been added to the cache could lead to un-necessarily missing a slot. I think it would be better to wait on the condvar, and when it wakes up "check the time" to make sure you are not early/later(and if early maybe do another
wait_timeout
, or just busy loop for a little bit of time instead).
cc @massalabs/core-team
Hi !
-
there is no good reason to fuse them... On the contrary, since we have a clear separation of concerns (the two factories don't have the same function and don't interact at all) we could even split them into 2 completely separate modules, which will facilitate tests as well. Note that the factory wakeups were designed in a way that prevents double staking on clock adjustment, and allow reasonable (non-perfect) precision timings
-
the only case in which we might need to wait would be if a slot draw we need is still not available (2 cycles of non-finality !) this is a catastrophic failure as it gets out of the bounds of our theoretical security model
the two factories don't have the same function
They both have the function of producing objects at a given slot, they also do pretty much the same thing in terms of communicating with the other modules.
So at each slot, a single factory could produce both objects.
Especially so as the production it not CPU intensive, while it does require similar communication with other modules(no need for parallelism, and better to have only one thread do all the communication).
For example, the two calls to selector.get_selection
and selector.get_producer
are accessing the same underlying data.
So you could do something like:
- Get all the data you need
- Check the wallet
- Produce endorsement
- Produce block
From a single thread.
we could even split them into 2 completely separate modules, which will facilitate tests as well
You can split (parts of) them and test them separately, while running both on the same thread(and test the thing as a whole as well).
For example you could test two separate process_slot_for_block(all the data you need)
and process_slot_for_endorsement(all the data you need)
functions without any threading, even without any communication with other modules.
Note that the factory wakeups were designed in a way that prevents double staking on clock adjustment, and allow reasonable (non-perfect) precision timings
I don't know exactly what you mean by that, I can only tell you that there is zero guarantee as to when either the threads wakes-up after the timeout is reached. Having only a single thread waiting on the timeout removes half of that uncertainty.
I'm not saying you need to wait, just that in both cases the first thing done is getting the producers info from selector
, so you might as well wake-up when selector updates that info, it would seem.
I think point 1 is more important.
the two factories don't have the same function
They both have the function of producing objects at a given slot, they also do pretty much the same thing in terms of communicating with the other modules.
It's arguable...
The block factory wakes up at genesis_timestamp + S.period*t0 + thread*t0/T
to produce a block at slot S
To do so, it interrogates the endorsement pool, the operation pool, Execution, Consensus, then it sends the block to Consensus.
The endorsement factory wakes up at genesis_timestamp + S.period*t0 + thread*t0/T - t0/2
to produce endorsements to be included at slot S
.
To do so, it interrogates Consensus, then it sends the endorsement to the endorsement pool.
So at each slot, a single factory could produce both objects.
Especially so as the production it not CPU intensive, while it does require similar communication with other modules(no need for parallelism, and better to have only one thread do all the communication).
For example, the two calls to
selector.get_selection
andselector.get_producer
are accessing the same underlying data.So you could do something like:
1. Get all the data you need 2. Check the wallet 3. Produce endorsement 4. Produce block
From a single thread.
we could even split them into 2 completely separate modules, which will facilitate tests as well
You can split (parts of) them and test them separately, while running both on the same thread(and test the thing as a whole as well).
For example you could test two separate
process_slot_for_block(all the data you need)
andprocess_slot_for_endorsement(all the data you need)
functions without any threading, even without any communication with other modules.Note that the factory wakeups were designed in a way that prevents double staking on clock adjustment, and allow reasonable (non-perfect) precision timings
I don't know exactly what you mean by that, I can only tell you that there is zero guarantee as to when either the threads wakes-up after the timeout is reached. Having only a single thread waiting on the timeout removes half of that uncertainty.
I'm not saying you need to wait, just that in both cases the first thing done is getting the producers info from
selector
, so you might as well wake-up when selector updates that info, it would seem.I think point 1 is more important.
I understand your points. On the other hand, in the light of https://github.com/massalabs/massa/discussions/2895 I think it's not worth focusing on this architectural change right now before we decide on 2895
Especially so as the production it not CPU intensive, while it does require similar communication with other modules(no need for parallelism, and better to have only one thread do all the communication).
About that, it might be a way to solve that without a big refacto for now. I'm pretty sure that tokio permits to create a runtime with "maximum 2 threads" (plus the scheduler I guess). So we can make that tow thread concurrent since a parallelism is needed, in the worst case.
In addition, we could add the selection of endorsers in that runtime. It would wake on a cycle change and sleep until a new info. But that would be an architectural change bigger.
Probably the OS already use these sleeping threads smarter than we could implement that after all.
solved by refactoring