VisualRegressionTests.jl Best practices (and how to get there)

Testing visual output is a complex art... much trickier than most testing, mainly because of the fuzzy nature of what constitutes a "pass". I want to discuss best practices for visual testing in Julia... not what people do now, but rather what we should do. There are a few issues which we need good solutions for:

How fuzzy should the comparison be? This was the main focus of the initial version(s) of VisualRegressionTests (VRT), but still requires thought. Some images will match almost exactly, others will only be similar. Ideally we have a method to fine-tune the fuzziness of each test in a semi-automated way.
Where are the reference images stored? Some options:
- In another github repo. This is what I currently do, that repo is PlotReferenceImages.jl. For tests to work, one must install that repo. For travis I clone it as part of a setup script.
- Bundled with the repo we are trying to test. I started with this, and the size of the repo exploded. (see https://github.com/tbreloff/Plots.jl/issues/264)
- Uploaded somewhere. But... where? And can it easily be version controlled?
How do we properly match reference images to the tagged releases? Should the images be tagged in tandem? (but what if we are testing many packages with the same image repo?) Should we maintain a directory structure with the version info? (this is what I do for Plots)

ref: https://github.com/JuliaLang/METADATA.jl/pull/5545 https://github.com/JuliaLang/METADATA.jl/pull/5494

cc: @tkelman @mfalt

Jul 05 '16 13:07 tbreloff

I definitely don't have all the answers, but I have some ideas and some thought on how it works for me/how I plan for it to work with ControlSystems.jl:

I am currently tuning the fuzziness manualy, which is not optimal, but the approach was to run the code locally with PyPlot and then on Travis with PyPlot (we only test with PyPlot). There would be some differences, probably because of matplotlib. The tolerance was then set to a factor of this difference (2 in my case), maybe something similar can be done automatically?
I think that having the references in a separate repo would make sense, and there is little reason for it to be a julia repo. A simple git clone could then suffice, without the need of using the package manager.
A nice way to make sure that the correct images are used for the respective branches/version of the code could be to have a similar version system on the reference repo with the test referring to an appropriate version. That way you can explicitly say, when changes were made such that the references are not expected to match, that you need new references, without breaking the tests for old versions. I assume that VisualRegressionTests could suggest to create a new branch (on the references) when a new tag is discovered (one the main package) for example.

Jul 05 '16 15:07 mfalt

Using git to download a binary resource is overkill and not really necessary. You probably don't need version control at the download site, only at the repository level. Just download an archive of a tagged version, then you can change the tag as needed. Please don't clone master of things, that's really bad for reproducibility and being able to go back to old versions and test them down the road.

Jul 05 '16 20:07 tkelman

You probably don't need version control at the download site, only at the repository level

I don't understand what you mean by this. When you say "download site", do you mean the computer which is running a test? When you say "repository level", what do you mean?

Please don't clone master of things, that's really bad for reproducibility and being able to go back to old versions and test them down the road.

It's no worse than using an image server with a snapshot of the reference images (which is what I think you're suggesting)

Maybe you could be a little more specific with what you're suggesting, and give an example workflow? Where are the images stored? How/when/where do I upload/download? How do we connect a package version/commit with the correct reference image for reproducibility? If the images are served statically, who hosts them?

Jul 05 '16 20:07 tbreloff

You could use a github repo to hold the images if you want, it's not ideal but it's easy. But where the tests get run, I would be surprised if you needed the full history of all past versions of the data. For downloading the data, rather than a git clone I'm suggesting downloading and extracting the archive that github can create for you from any tag or sha. The test script would just contain the particular tag or sha to test against, download and extract that.

Jul 05 '16 21:07 tkelman