PackageEvaluator.jl Make more flexible

For the sake of triggering PkgEval runs automatically to find out how breaking proposed changes might be, I want to make PkgEval more of a library (as opposed to running things manually off of a hacky branch when I want to compare nightlies: https://github.com/JuliaCI/PackageEvaluator.jl/compare/master...tkelman:nightly). Some ideas floating around my head, I'll most likely be implementing these but want to write them down now.

Make the entry point take as inputs:

a list of Julia versions, some set of
- x.y-latest release
- x.y.z specific point release
- latest nightly
- x.y.z-sha custom branch build from http://s3.amazonaws.com/julianightlies (building off a branch has to be done by manually triggering the buildbot right now, but would eventually connect that trigger to a comment-listener webhook to make that part automatic too)
a list of packages
- all
- a subset given by a REQUIRE file
how many parallel vagrant images to use for the run

Then for fun and because I'm an optimization nerd at heart, use timing data from the most recent set of full nightly PkgEval log files to optimally distribute the jobs to each VM worker. Should be able to get approximate measurements of package install time and test time using each log file's creation time and modified time. Each VM worker will be constrained to use a single julia version (I guess with a sequential fallback if we need to support testing against more julia versions than we have available workers), and we approximate total time per worker as the sum of package test times that get assigned to that worker plus package install times of the packages that get assigned to that worker, plus install times of their dependencies. Solve for package-to-worker assignments that minimize total time taken on the slowest worker. This should encourage packages with long BinDeps compile times (like AmplNLWriter, CoinOptServices, Cbc, Ipopt, etc) to run on the same worker as their dependencies. Should be a fun assignment problem for JuMP, and maybe even save enough run time to be worth the trouble.

Jun 08 '16 09:06 tkelman

While looking at the code for PR #160, it struck me as weird that we re-use the same VM to run tests for multiple packages. I would change this first. It's very difficult in a scenario like this to isolate issues that could be generated by a previous package install/build or test (thinking of issue #146 for example).

I do not have any experience with vagrant, but at my current workplace I'm testing using mostly lxc unshare+aufs directly and running each test individually by taking hierarchical snapshots, so that I can discard the entire environment and compare test results without cooperation from the test runner itself.

I guess that if OSX is a requirement, we cannot easily do the same. But I'd first change the workflow so that a snapshot of the provisioned VM is taken after julia is ready, but before any package is installed. Then reuse the image as a base for each test separately to speed things up. And since each test allocates one VM, job parallelism becomes trivial again.

Right now we rely on the test runner to terminate successfully, which is problematic when things go wrong. You could actually kill the VM if each test was isolated.

Seeing tests, I was under the impression that PackageEvaluator was using the travis infrastructure at first, but I guess it's run on a dedicated server?

Apr 16 '17 12:04 wavexx

I'd probably use docker for this if I wanted to do some major Linux-only refactoring. There is actually a benefit of running multiple package tests in the same VM when installation relies on compiling some large library, since any dependent packages can avoid repeating the time it took to do that.

I guess it's run on a dedicated server?

Correct. PackageEvaluator has basically nothing in common with Travis.

Apr 16 '17 14:04 tkelman

On Sun, Apr 16 2017, Tony Kelman wrote:

Linux-only refactoring. There is actually a benefit of running multiple package tests in the same package when installation relies on compiling some large library, since any dependent packages can avoid repeating the time it took to do that.

Good point.

I guess it's run on a dedicated server?

Correct. PackageEvaluator has basically nothing in common with Travis.

Would you mind sharing some stats about the infrastructure? [I'm just curious]. How much space / time does it require for a full run on all 3 major versions?

Do you run it yourself?

Apr 16 '17 15:04 wavexx

What do you want to know? It runs on one node of the same nanosoldier server that does Base Julia's benchmarking. The cronjob starts at 12:20 am eastern every day. Check the commit times at https://github.com/JuliaCI/pkg.julialang.org for how long it takes. It used to be around 12 hours, but the 0.6 jobs have been taking much longer than the others recently. Probably the number of deprecation warnings, making the logs really long and slowing vagrant down.

Apr 16 '17 15:04 tkelman

PackageEvaluator.jl PackageEvaluator.jl copied to clipboard

Make more flexible

PackageEvaluator.jl
PackageEvaluator.jl copied to clipboard