usegalaxy-playbook icon indicating copy to clipboard operation
usegalaxy-playbook copied to clipboard

MultiQC does not recognize raw data (txt) from FastQC when in a collection

Open jennaj opened this issue 7 years ago • 27 comments

Problem: The select menu does not find the FastQC collection input. However, it will find other ~collections~ individual dataset with the txt datatype. Both tool versions impacted and was not a problem before 18.05 pre-release afaik.

Test history: https://usegalaxy.org/u/jen/h/test-history-trimmomatic-trim-galore

dataset 46 should be in the select list:

screen shot 2018-05-07 at 11 32 34 am

tool form with other txt collection data in the history: Drilling down into dataset 46 shows that the correct datatype is assigned.

screen shot 2018-05-07 at 11 37 08 am screen shot 2018-05-07 at 11 38 05 am

jennaj avatar May 07 '18 18:05 jennaj

Retesting

jennaj avatar May 16 '18 20:05 jennaj

Still a problem with retest. Change back tags when fixed please and I'll test again

jennaj avatar Jun 04 '18 17:06 jennaj

You made it a list of pairs (hid 46 on first picture) instead of a list. I think multiqc has no understanding of a list of pairs.

martenson avatar Jun 04 '18 17:06 martenson

Which seems like a collection-in-tool framework problem, because you'll want to run a set of paired end datasets through fastQC and then through multiQC.

from multiqc:

                        <param name="input" type="data" format="txt" multiple="true" label="FastQC output">
                            <validator type="expression" message="MultiQC does not accept the HTML report generated by FastQC, only the Raw Data">value is not None and value.extension != "html"</validator>
                        </param>

it should accept multiple datasets, which I'd claim that a list:paired contains. Of course, from a framework level, its a bit ambiguous on how to handle this, e.g. batch for each sub-list/pair separately, or one tool run for all data -- probably should have multiple runtime-configurable options.

fastQC works because it doesn't have multiple="true" and it allows access to batch mode.

Current work around is to flatten the collection manually into a list with collection operations, but this shouldn't really be necessary.

blankenberg avatar Jun 04 '18 19:06 blankenberg

Also reported here: https://github.com/galaxyproject/galaxy/issues/5875

jennaj avatar Jun 04 '18 20:06 jennaj

Looks like it's fixed in galaxyproject/galaxy#6255, it just needs to be backported to 18.05. Test was updated yesterday so we should be able to test there.

natefoo avatar Jun 05 '18 14:06 natefoo

So with that fix it's going to map over list:pairs if you have list of pairs, meaning you get one multiqc report per pair. Most likely you still need to unpack your list of pairs in most circumstances. This is because fastqc, the software (not the wrapper), is not paired-end aware. A paired-end aware QC tool would take in a pair and produce a single dataset.

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

Assuming I did this right, it still doesn't see this collection: https://test.galaxyproject.org/u/nate/h/multiqc-input-test

natefoo avatar Jun 05 '18 14:06 natefoo

@mvdbeek Hrm, should the collection be selectable as an input now though?

natefoo avatar Jun 05 '18 14:06 natefoo

Yes, did you rebuild the client ?

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

@mvdbeek the client builds on-deploy for Test, it should be up to date

martenson avatar Jun 05 '18 14:06 martenson

The input datasets are deleted for me, so that may be an issue: screen shot 2018-06-05 at 16 29 26

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

they are not deleted for me, both before and after import... 😭

screenshot 2018-06-05 10 31 25 screenshot 2018-06-05 10 31 06

martenson avatar Jun 05 '18 14:06 martenson

It just switched for me as well.

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

there were some more related fixes, though they should all be in dev

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

Test is on https://github.com/galaxyproject/galaxy/commit/6911153c307ba25525df7c165b4f10b4260964ac

martenson avatar Jun 05 '18 14:06 martenson

Yeah, something isn't working locally either

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

Also there is a boolean parameter at the bottom that is being swallowed: screen shot 2018-06-05 at 16 49 31

mvdbeek avatar Jun 05 '18 14:06 mvdbeek

Just re-tested at test.galaxyproject.org, still a problem, might be expected.

Test history: https://test.galaxyproject.org:/u/jenjackson/h/test-history-multiqc-at-testgalaxyproject

Tested all three collection types, none have the .txt FastQC recognized by the tool yet.

screen shot 2018-06-05 at 11 17 35 am

So not just "paired list" or "pair" are not detected by MultiQC, but also collection type "list".

screen shot 2018-06-05 at 11 22 12 am

jennaj avatar Jun 05 '18 18:06 jennaj

Added tags to the tests so others can better understand what those are.

I tried to drag and drop out of a collection. One worked Ok (forward), the other shows up wierd in the select list. Is that a known or just something buggy on Test? I'll see if can reproduce on org.

screen shot 2018-06-05 at 11 32 24 am

jennaj avatar Jun 05 '18 18:06 jennaj

At Main/org, same issues with dropped datasets, actually worse, all show up in the select with "Dropped: XXXXXX" not the dataset name with (hidden). Does this need a different ticket or is it a known? @jmchilton

test history: https://usegalaxy.org/u/jen/h/test-history-multiqc-hidden-fastqc-rawdata, dataset 48 is what I tested with dragging the collection files over into MultiQC

screen shot 2018-06-05 at 11 41 40 am

jennaj avatar Jun 05 '18 18:06 jennaj

Related, should probably be open again and not closed. https://github.com/galaxyproject/tools-iuc/issues/1658

jennaj avatar Jun 05 '18 18:06 jennaj

simple list of raw data from fastqc works fine for me with multiqc, https://usegalaxy.org/u/martenson/h/unnamed-history-2

martenson avatar Jun 05 '18 18:06 martenson

Interesting, dataset 32 is basically the same content and doesn't work. Yours was a new list collection of merged txt output, might was produced from a list of fastq datasets that was run through Fastqc as collection list input.

Not sure what the difference is under the hood, but is a really good test/comparison. Maybe will help figure out what is going wrong.

jennaj avatar Jun 05 '18 23:06 jennaj

Can we create one issue per issue ? I see different servers with different galaxy versions with different inputs dragged or not dragged. This is very confusing. That regular lists can't be selected with multidata inputs (as in the case for multiqc) on dev should be fixed by https://github.com/galaxyproject/galaxy/pull/6300.

Selecting pairs or list of pairs in a multidata input can be made possible, but is probably not the right thing to do as it wouldn't do what you would like it to do. The correct thing is to use a QC tool that is paired-end aware and outputs a single report per fastq pair or to unzip the collection and use a regular list.

The dragging issues are separate, and if there's no issue on the main galaxy repo we should create an issue there.

mvdbeek avatar Jun 07 '18 13:06 mvdbeek

This isn't on main yet (cannot see the collection txt files). Could we leave this open in usegalaxy-playbook until the changes are implemented, please? Just in case something else comes up during integration testing.

jennaj avatar Jun 19 '18 02:06 jennaj

Dragging and dropping, that will need a ticket. I'll make one if someone else hasn't. I wasn't sure if it was a main, test, or galaxy problem before.

jennaj avatar Jun 19 '18 02:06 jennaj