crater Crater API and distributing jobs across machines

trafficstars

In the team structure revamp on internals I saw there was interest in making crater bors-controlled. So I thought about it for a bit, and came out with a rough idea of what should be done to achieve it.

In order to be bors-controlled, crater needs to have some sort of API to communicate with it, and since there are multiple crater machines the most convenient way is to have a central API that manages experiments, and some agents installed on the machines that executes them.

Also, since the agents are going to receive the jobs from the API, it's easy to send individual crates instead of whole experiments, distributing the execution on all the crater machines.

I thought about this plan, which doesn't disrupt the current usage of crater:

Remove as much as possible from the prepare-ex stage of a crater run: everything crate-specific will be prepared just before that crate is tested. This shouldn't affect the current crater usage, but will allow to avoid preparing unnecessary things if a crater run is split between more machines.
Add an API to allow the crater crate to be used as a library by the agents/central API: this way those can be developed without touching the existing crater codebase.
Implement the crater_agent and crater_api crates out-of-tree:
- crater_agent will run on the machines, using long-polling requests (thus avoiding any exposed ports) to get new jobs to run from the central API. The agent will run the job (deleting the previous experiment and running define-ex/prepare-ex if it's a different experiment than the last one), upload the results to the central API and then request a new one.
- crater_api will provide two different HTTP/json APIs: one to communicate with the agents, and one to manage experiments. The latter one can be used either by bors or to build a web interface.
Eventually merge the two crates in this repository, and deploy them.

Any thoughts on this? I would be willing to implement this.

Mar 01 '18 17:03 pietroalbini

Great to see your interest here!

Remove as much as possible from the prepare-ex stage of a crater run: everything crate-specific will be prepared just before that crate is tested. This shouldn't affect the current crater usage, but will allow to avoid preparing unnecessary things if a crater run is split between more machines.

I was thinking a bit about this while reviewing #188 - what about a restructuring so each crate has a queue of operations that it executes e.g. [prepare, build, test]. I think this is pretty similar to your thought, and means we can implement the skip-test settings by just removing operations from the list rather than having if should_skip_crate around the place. In addition, once this restructuring is done it should be quite easy to allow particular phases to execute in parallel (the first crate can be testing while the second is preparing) which will hopefully speed up crater by quite a bit.

However, I don't see this as being strictly necessary for an initial implementation of bors-controlled. It'd certainly be independently a big win, but I suspect you can get something working without it.

2, 3, 4

You're welcome to just do this all in-tree - we have a 'serve' subdirectory that I'm 99% sure hasn't been used for some time, so you could rename that to agent and just implement both agent and api as crater subcommands. Possibly less effort than trying to refactor as a library.

An additional note: this will (at the final integration step) require some bors integration - rather than integrating with bors directly, configuring custom commands to run in response to bors commands passed in a comment could be preferable.

Mar 03 '18 20:03 aidanhs

PR with the first implementation: #202

Mar 28 '18 18:03 pietroalbini

By the way, how should the bot check if the user is authenticated? I can think of these three options:

Have a manual whitelist in config.toml like bors: it works, but I don't think adding another separate list to manage teams membership is what we want.
Have a list of teams (like rust-lang/release) in config.toml and fetch the membership on the fly: this is the most flexible, but the bot has to be a member of the rust-lang organization to be able to access memberships, and that might be a problem for external contributors.
Allow everyone with push access to the repo: this is the simplest and works fine even if an external contributor is testing things on a private repo of theirs. On the rust repo this means everyone that can apply labels can manage crater.

What should I implement? I'm thinking a mix of the first two proposals is the most useful (so teams are managed automatically but someone can add theirself if they need to test locally), but I'm not the one who should decide that ;)

Apr 06 '18 12:04 pietroalbini

Option 1 for now is totally fine - in time we'll move to it being team based, but I want to have appropriate guard rails in place first (e.g. making sure the person requesting a crater run is a member of a rust team). Between (say) three people it's a pretty minor overhead.

Apr 10 '18 18:04 aidanhs

crater crater copied to clipboard

Crater API and distributing jobs across machines

crater
crater copied to clipboard