hash
hash copied to clipboard
Enable multiple worker execution
🌟 What is the purpose of this PR?
Currently, we are setting the number of workers to 1. In order to achieve good scaling, the number of workers should be adjustable so tasks can run on different threads (and processes in the case of Python) in parallel.
⚠️ Known issues
- Simulation, which removes agents, currently does not necessarily pass. This is probably because of our logic on removing groups as we remove them by index, not by ID, so when a batch is removed, which is not the last index, other batches' index changes. For worker indices, the second dot in What does this change? provides a fix.
🚫 Blockers
- ~~Better separation of crates (internal)~~
🔍 What does this change?
- Remove the hardcoded setting of number of workers to
1 - Fix the removal of active workers in
DistributionController. Previously, when a worker with a lower index (e.g.0) was removed, other workers had a changed index. In order to achieve this, I replaced theVecat theactive_workersfield with aSet. Alternatively, we could also have searched the vector for the worker id instead. Done in cca9b2b04859167cb10da15af28e63c8d9d5132a at packages/engine/src/workerpool/pending.rs This enables running simulations with agents distributed for all workers (i.e. no workers have "empty" tasks). This leads to: - Only create behavior tasks if the task would have agents associated with: a2b1b94f2c49b760093445bc07889e078ae4616a
- Add an
#[instrument]anddebug!output to track the currentworker_index: 560d61977184e75c9299b69a7d155b168083c818 - When multiple workers are enabled, some workers may have no batches assigned. c1fffd891ab343f5788fb0af7473206fa76138d1 handles empty batches correctly, when new agents are added when running context packages. This also sets the
worker_indexon batches correctly to be picked up when distributing new agents.
📜 Does this require a change to the docs?
No
🔗 Related links
- Asana task (internal)
🛡 What tests cover this?
The integration tests should cover this well enough
❓ How to test this?
Run different simulations with multiple workers enabled. Especially simulations which are adding/removing agents are crucial.