batou
batou copied to clipboard
Improve (secret) data handling and separating model computation
- Environments are evaluated on the controller
- the model computation can happen on individual nodes (assets/agents) to leverage their network environment
- we could even perform the dependency discovery in a distributed fashion this way
- data/secrets are only fed into the agents as needed
Separating the model computation is interesting that we might want to strive for supporting a variety of models. Aspects of the various approaches:
Running the configure phase on the controller
- probably fastest
- safest
- easier to implement
- harder to access resources from the targets (which is hard anyway if we model provisioning would have to run/realize in multiple phases so we actually get access to resources that we need to compute the model)
Running the configure phase on one agent
- how does that work if we only have passive agents (APIs, CLIs) on the remote sides?
- how does that work if we need the model to provision the agent?
- at the moment everyone has to compute their model anyways and we only use the information from the first host to learn about ordering
- is fast for execution as we don't have network latency for small operations but only trigger the deployment of high-level components, all prerequisite data has been shipped as bulk through the repository shipping method or accessed from the target network environment in the configure phase
Running separated configure phases on all agents
- easy to control secret propagation
- interesting from a performance aspect
- how to integrate passive agents? ends up being run on host anyway
- how to integrate with bootstrapping/provisioning tasks?
- probably slow due to coordination/latency overhead?
Computing the model on the agents seems to be the most complicated, least flexible part that requires careful managing of code distribution. In the current way it's actually kinda simple because we distribute the agent in a consistent state. However, I guess we only really want computation on the target system to be able to acquire data.
I'm still much for being able to compute a consistent model in the beginning and then pressing 'go'. Obviously 'predict' only gets you so far if the underlying world keeps moving, so there is never a guarantee that the actions we predict will be the ones we meet when actually performing the tasks.
So, an aspect of this is distributed fact gathering, which is woven into the model computation. However, we're still stuck with the part where we'd need to slow keep rolling the model to gather facts that are based on previous actions for later actions.... if we supported that we'd have a pretty powerful tool but it might be that that is a trap and we should strive for a simpler overall model.