tools icon indicating copy to clipboard operation
tools copied to clipboard

Testing of pipeline output

Open olgabot opened this issue 4 years ago • 10 comments

(Moving discussion from slack)

tl;dr: Currently, nf-core pipelines merely check that all the processes run without errors, but do not ensure that outputs exactly match some expected, like unit testing. This thread is to address the question: How can we change that?

Related:

  • https://github.com/nextflow-io/nf-hack18/issues/7
  • https://github.com/nf-core/tools/issues/209

Full discussion:

@stevekm:

is there an example of the pipeline-outputs testing? Our group is gonna start working on something similar and I didn't want to reinvent the wheel too much.. I was gonna end up just using the basic Python unittest suite to run the pipeline then compare output files either against saved fixtures or just check for certain attributes. But wasnt sure if there were other methods in use for that kind of thing prob similar to these methods; https://github.com/stevekm/nextflow-ci

@drpatelh :

Dont think we have anything in place yet to check module outputs yet. Currently only checking whether the module itself is working within NF using a minimal test-dataset. Some examples here: https://github.com/nf-core/modules/pulls Be tricky testing for outputs because there will need to be some sort of tolerance to deal with outputs changing between tool releases.

@ewels

Any inputs or suggestions welcome @stevekm! In short we still need to set this up. Even for the full pipeline runs it would be nice to check for (a) the existence of files after the run and (b) certain attributes. This has been in the back of my mind for ages but I've never got around to figuring out the best way to do it. My preferred implementation would be to have it as a sub-tool of the nf-core tools I think, that's where we keep everything else like this. So if you're going to write something anyway, would be super nice if it could be part of that so that we can use it too :slightly_smiling_face:

@stevekm:

oh unfortunately, whatever I write for work in the short-term is gonna be CWL-centric, since thats the chosen framework here for the time being ... :grimacing: If I do come up with anything useful though I will definitely share

... a few days later ...

@olgabot

Hello all! I’m not sure where to put this but I had a question regarding testing. Currently, the nextflow run -profile test,docker . is mostly a “smoke test” to check if all the processes run without errors. But there are no proper “unit tests” to check that the output was actually correct. I recently ran into this problem because we merged in a PR that created some empty output files but the processes still ran fine. There’s this discussion from Nov 2018: https://github.com/nextflow-io/nf-hack18/issues/7 - has there been any progress on making unit test type things for nf-core pipelines?

@ewels

Um, see the discussion about ten messages up.. :point_up: :grin: (No, but it would be good to put something simple in place - suggestions for appropriate frameworks welcome)

@olgabot

Haha thanks. My bad for not seeing it. My personal preference would be to have it be part of nf-core vs nextflow because I’d prefer to write Python over Groovy but that’s just me (edited) What can be problematic is that many tools have some element of stochasticity (e.g. STAR) so either a random seed would need to be set, or the tool would need to tolerate some amount of difference The simplest thing I can think of would be to have several example “results” folders hosted somewhere and assert diff is zero. But that’s the lazy version of me talking The OCD version of me says it should do assert statements in Python like this @stevekm has: https://github.com/stevekm/nextflow-ci/blob/master/test_flagstat.py and add a couple tests to at least assert certain files are nonempty and then check their contents (which I guess is implicit with checking their contents)

@ewels

Let’s switch to #tools

From @ewels

@olgabot - about checking output files Originally I was thinking that we could have a yaml file for each pipeline that lists all of the expected output filenames The minimal test would be to just check that each exists, but we could have keys under each file for different attributes For example not_empty, filesize, num_lines Each of those could have a tolerance set too - eg num_lines could have tolerance of 6 or whatever Then we could have nf-core check_test_results or something which takes a results folder and a config file and loops through each one That would be very simple to write but fairly effective I think However, after the discussion on #general, I’m wondering if this is reinventing the wheel and if there’s already a (python) testing framework which has this kind of functionality, which we could use

from @olgabot:

At least for tables/dataframes, pandas has a nice pandas.utils.testing.assert_almost_equal that lets you set a tolerance. Same with numpy

olgabot avatar Apr 27 '20 20:04 olgabot

I've had a very brief look around and can't find much existing stuff for testing output files (please post suggestions if you know any!). So I'm getting increasingly keen on writing something to do this. To expand on my slack message above, I could see this working as follows:

New nf-core tools check command

nf-core check-results <pipeline> <output-folder>

  • We have a standard place to keep the output metadata config file in the pipeline. Maybe assets/test_outputs.yaml?
  • Other files can exist as well, but this one should correspond to running with just -profile test
  • <pipeline> could be a name of an nf-core pipeline with this file, a directory with a pipeline that has this file, or a metadata file path.
  • <output-folder> should be the publishDir results folder for a pipeline run

Metadata syntax

Could look something like this as a rough example (brainstorming here, suggestions welcome):

Simplest possible - just filenames:

- fastqc
  - SRR389222_sub1_fastqc.html
  - SRR389222_sub2_fastqc.html
  - zips:
    - SRR389222_sub1_fastqc.zip
    - SRR389222_sub2_fastqc.zip

Adding in a few more metadata attributes to check:

- fastqc
  - SRR389222_sub1_fastqc.html:
      mimetype: text/html
  - SRR389222_sub2_fastqc.html:
      mimetype: text/html
      content: '<tr><td>Sequence length</td><td>36</td></tr>'
  - zips:
    - SRR389222_sub1_fastqc.zip:
        mimetype: application/zip
        filesize: 512694
        filesize_tolerance: 1000
    - SRR389222_sub2_fastqc.zip:
        mimetype: application/zip
        filesize: 512306
        filesize_tolerance_pct: 5

Bonus thoughts

  • Python or groovy?
    • Easiest route (for me) is the new subcommand above
    • Could be nice added value to do this in Groovy instead, and have a --check-results pipeline param that enables the check at the end of the run. But probably complicates things and bloats codebase. (Could always add this in later with the same metadata files)
  • Ignore unexpected files? (I think so)
  • Need a command to build / update these metadata files automatically

ewels avatar Apr 28 '20 08:04 ewels

I agree that something which integrates as cleanly as possible with the current Nextflow infrastructure would probably be best

stevekm avatar Apr 28 '20 08:04 stevekm

It seemed to me that one of the challenges was just in writing the code the check the output of each file type. That's why in my example I did a wrapper around samtools flagstat then stopped there, because the time investment grew rather large to wrap each tool's files in a class just to more easily access it's output. In that regard MultiQC could be useful.

stevekm avatar Apr 28 '20 09:04 stevekm

for reference, the method I hacked together with Python unittest was to just validate the exit status of the pipeline, and do custom parsing of the output files;

https://github.com/stevekm/nextflow-ci/blob/master/test_nextflow.py https://github.com/stevekm/nextflow-ci/blob/master/test_flagstat.py

another reference, I have been working on basic methods to do similar unit testing of CWL workflows here; https://github.com/mskcc/pluto-cwl/blob/master/tests/test_generate_cBioPortal_file_cwl.py

This seems pretty similar to the issue of unit testing Nextflow outputs, if Python were chosen for it.

Now that the new DSL2 + module system is coming out soon, maybe it would be good to wait until the formal release of those? The Nextflow modules especially seem like something that would help a lot to streamline unit testing for Nextflow.

In my CWL examples there, I am taking advantage of the default cwl-runner's behavior of returns a JSON that gives a definition and description of all the output items defined by the workflow/tool;

https://github.com/mskcc/pluto-cwl/blob/c750d9d23bc56680e7071a9841a50cd0ca6263b6/tests/test_generate_cBioPortal_file_cwl.py#L58-L71

@pditommaso @evanfloden maybe this is a feature we could replicate in Nextflow in order to aid testing like this? Having a method to get a description of the task/workflow outputs like this would help a lot I think. Not sure if this is accessible via Groovy either?

@ewels

stevekm avatar Jul 15 '20 10:07 stevekm

The nextflow dump() channel operator can be used for debugging / testing perhaps.

A few pipelines (eg. sarek) already use this for debugging / introspection. It's not exactly standardised though.

ewels avatar Jul 15 '20 10:07 ewels

@Emiller88 is working on this, currently in the rnaseq pipeline - https://github.com/nf-core/rnaseq/pull/546

ewels avatar May 09 '21 20:05 ewels

@ewels This is lovely for context!

@maxulysse has also got started on sarek. https://github.com/nf-core/sarek/pull/370

I think my plans for it as of now are to flesh this out on rnaseq in this order

  1. Covert current CI to pytest-workflow (https://github.com/nf-core/rnaseq/pull/546)
  2. Add local subworkflow e2e tests
  3. Add local module tests

I'd like to do those in 3 PRs to keep the scope small.

Then possibly in the future add tests for bin scripts and local groovy code, but I think those are far off.

The functionality that would be nice here in tools is just a nf-core test <tag> -profile docker or something similar.

I'd love for any feedback!

edmundmiller avatar May 09 '21 20:05 edmundmiller

I'm following along. I'd love to see an example of how to incorporate unit-testing and/or regression testing into my nextflow DSL2 projects. I'm looking for guidance. I'm happy to follow along with your recommendations; and especially happy to follow along by looking at some example projects.

Where's the best place for me to get started?

As an fyi, I built a similar testing framework for the GenePattern Server platform. Called GpUnit. https://github.com/broadinstitute/GpUnit So as to not reinvent any wheels, at the time I chose to build on top of Ant and Junit. It's still going strong. Some decisions I am happy with, some not so much.

Good ideas:

  • don't reinvent the wheel; it's a good idea to integrate with an existing test framework
  • declarative, human readable test-cases, committed to GitHub; GpUnit declares tests in a (schema-less) YAML format; Even though the guts of the system is JUnit, it's very simple to create the tests; the implementation of the test framework is decoupled from the declaration of the test cases
  • keep it simple; it's easy to get started; by default expect one regression type test, consisting of a known good run of your module; ** the simplest case checks for non-zero exit code ** the next simplest case checks for file names; e.g. did my module output a file named 'my-output.txt'? ** it's also possible to diff actual output with expected output, on a file-by-file basis ** you can also do a recursive diff of an entire directory tree ** it's modular: the diff function can be customized to handle for example small differences in numerical output

Not-so-good ideas:

  • each test is a full round-trip to the SaaS server ** this is pretty good for stress testing the server; including REST API, job engine, etc. ** this is a problem if the server is down; tests will fail ** or the module under test is not installed on the server; test will fail

Challenges:

  • recursive directory diff is not sufficient to cover all test cases
  • documentation. In general it's difficult to document software systems; some of the finer points of the system are buried in the comments in the source code, in slack messages, old email messages, etc.
  • make it easy to get started; for example by making a template project that requires one (and only one) test-case to exist and to pass before your CI server gives you the green light

peterjcarr avatar Jul 02 '21 21:07 peterjcarr

pytest-workflow is running in production in Sarek now: https://github.com/nf-core/sarek/tree/dev/tests

I think we should add this to the pipeline template.

grst avatar Jan 25 '22 19:01 grst

I think the new cat of the pytest-workflow failed output was the final piece of the puzzle https://github.com/nf-core/modules/pull/1226

edmundmiller avatar Jan 26 '22 17:01 edmundmiller

Now being handled with nf-test.

ewels avatar Jun 27 '23 08:06 ewels