usegalaxy-playbook
usegalaxy-playbook copied to clipboard
MultiQC does not recognize raw data (txt) from FastQC when in a collection
Problem: The select menu does not find the FastQC collection input. However, it will find other ~collections~ individual dataset with the txt datatype. Both tool versions impacted and was not a problem before 18.05 pre-release afaik.
Test history: https://usegalaxy.org/u/jen/h/test-history-trimmomatic-trim-galore
dataset 46 should be in the select list:
tool form with other txt collection data in the history: Drilling down into dataset 46 shows that the correct datatype is assigned.
Retesting
Still a problem with retest. Change back tags when fixed please and I'll test again
You made it a list of pairs (hid 46 on first picture) instead of a list. I think multiqc has no understanding of a list of pairs.
Which seems like a collection-in-tool framework problem, because you'll want to run a set of paired end datasets through fastQC and then through multiQC.
from multiqc:
<param name="input" type="data" format="txt" multiple="true" label="FastQC output">
<validator type="expression" message="MultiQC does not accept the HTML report generated by FastQC, only the Raw Data">value is not None and value.extension != "html"</validator>
</param>
it should accept multiple datasets, which I'd claim that a list:paired contains. Of course, from a framework level, its a bit ambiguous on how to handle this, e.g. batch for each sub-list/pair separately, or one tool run for all data -- probably should have multiple runtime-configurable options.
fastQC works because it doesn't have multiple="true" and it allows access to batch mode.
Current work around is to flatten the collection manually into a list with collection operations, but this shouldn't really be necessary.
Also reported here: https://github.com/galaxyproject/galaxy/issues/5875
Looks like it's fixed in galaxyproject/galaxy#6255, it just needs to be backported to 18.05. Test was updated yesterday so we should be able to test there.
So with that fix it's going to map over list:pairs if you have list of pairs, meaning you get one multiqc report per pair. Most likely you still need to unpack your list of pairs in most circumstances. This is because fastqc, the software (not the wrapper), is not paired-end aware. A paired-end aware QC tool would take in a pair and produce a single dataset.
Assuming I did this right, it still doesn't see this collection: https://test.galaxyproject.org/u/nate/h/multiqc-input-test
@mvdbeek Hrm, should the collection be selectable as an input now though?
Yes, did you rebuild the client ?
@mvdbeek the client builds on-deploy for Test, it should be up to date
The input datasets are deleted for me, so that may be an issue:

they are not deleted for me, both before and after import... 😭

It just switched for me as well.
there were some more related fixes, though they should all be in dev
Test is on https://github.com/galaxyproject/galaxy/commit/6911153c307ba25525df7c165b4f10b4260964ac
Yeah, something isn't working locally either
Also there is a boolean parameter at the bottom that is being swallowed:

Just re-tested at test.galaxyproject.org, still a problem, might be expected.
Test history: https://test.galaxyproject.org:/u/jenjackson/h/test-history-multiqc-at-testgalaxyproject
Tested all three collection types, none have the .txt FastQC recognized by the tool yet.
So not just "paired list" or "pair" are not detected by MultiQC, but also collection type "list".
Added tags to the tests so others can better understand what those are.
I tried to drag and drop out of a collection. One worked Ok (forward), the other shows up wierd in the select list. Is that a known or just something buggy on Test? I'll see if can reproduce on org.
At Main/org, same issues with dropped datasets, actually worse, all show up in the select with "Dropped: XXXXXX" not the dataset name with (hidden). Does this need a different ticket or is it a known? @jmchilton
test history: https://usegalaxy.org/u/jen/h/test-history-multiqc-hidden-fastqc-rawdata, dataset 48 is what I tested with dragging the collection files over into MultiQC
Related, should probably be open again and not closed. https://github.com/galaxyproject/tools-iuc/issues/1658
simple list of raw data from fastqc works fine for me with multiqc, https://usegalaxy.org/u/martenson/h/unnamed-history-2
Interesting, dataset 32 is basically the same content and doesn't work. Yours was a new list collection of merged txt output, might was produced from a list of fastq datasets that was run through Fastqc as collection list input.
Not sure what the difference is under the hood, but is a really good test/comparison. Maybe will help figure out what is going wrong.
Can we create one issue per issue ? I see different servers with different galaxy versions with different inputs dragged or not dragged. This is very confusing. That regular lists can't be selected with multidata inputs (as in the case for multiqc) on dev should be fixed by https://github.com/galaxyproject/galaxy/pull/6300.
Selecting pairs or list of pairs in a multidata input can be made possible, but is probably not the right thing to do as it wouldn't do what you would like it to do. The correct thing is to use a QC tool that is paired-end aware and outputs a single report per fastq pair or to unzip the collection and use a regular list.
The dragging issues are separate, and if there's no issue on the main galaxy repo we should create an issue there.
This isn't on main yet (cannot see the collection txt files). Could we leave this open in usegalaxy-playbook until the changes are implemented, please? Just in case something else comes up during integration testing.
Dragging and dropping, that will need a ticket. I'll make one if someone else hasn't. I wasn't sure if it was a main, test, or galaxy problem before.