MultiQC
MultiQC copied to clipboard
Picard Target Region Coverage hard to read
Description of bug
In recent versions of picard CollectHsMetrics
(>2.23.8), PCT_TARGET_BASES is reported up to 100000x coverage, to support high depth amplicon sequencing (see https://github.com/broadinstitute/picard/pull/1542)
However, this extends the x-axis of the MultiQC graph up to 100.000, when most 'regular' sequencing project have a coverage around 100x.
MultiQC graph for picard 2.23.5
MultiQC graph for picard 2.26.10 (same data)
File that triggers the error
No response
MultiQC Error log
No response
MultiQC graph for picard 2.26.10 (same data), with the code from https://github.com/ewels/MultiQC/pull/1626. Note that the x-axis now logarithmic.
We've come across something similar with InsertSizeMetrics
and WgsMetrics
before and have these config options as a result:
https://multiqc.info/docs/#insertsizemetrics https://multiqc.info/docs/#wgsmetrics
Instead of a new solution, could we mimic the same pattern for this module for consistency?
I can add the config options for consistency, but I would still like to keep the default behaviour as is. That way, a regular 'naive' user will get the third plot by default, but they are free to modify it using the fold_coverage_xmax
option. Otherwise, they will get the second plot by default, which would make MultiQC more difficult to use for anyone but power users.
I'm sure that MultiQC has functionality to cut off long tails for plots though 🤔 That's what I was originally thinking about with the above post. I'm sure that there is some function or config to set the xmax automatically based on say 90% of the data. Need to sit down and try to find this again..
Do you have an update for this issue? My current solution simply cuts of 0 values at the end of the range, so it should mimic the MultiQC functionality to cut off long tails. Not that this does not strip out 0 values that are between higher counts.
So the insertsize module simply has a config option to set xmax
:
https://github.com/ewels/MultiQC/blob/81dd59cb6f582bf198e3058b02996d39b02b8175/multiqc/modules/picard/InsertSizeMetrics.py#L188-L191
So a short-term fix for just you would be to customise the plot config for this plot when you run MultiQC (see docs).
The WgsMetrics module is a bit more clever and is the one I was thinking about. Unless a threshold is manually defined in a config, it runs over the data and sets the xmax
at 99%:
https://github.com/ewels/MultiQC/blob/81dd59cb6f582bf198e3058b02996d39b02b8175/multiqc/modules/picard/WgsMetrics.py#L148-L159
What would be ideal would be to take this code (or something like it) and move it into the core line graph plotting code. Then it could be switched on or off for any line graph plot with a config option. There's already some similar stuff, like the data smoothing code, so it could follow a similar model (probably just setting xmax
if it's not already set).
If you fancy having a stab at writing this, that would be great. As you can see by the number of open PRs I have a huge backlog currently (new job + paternity leave) but I'm doing my best to work through it.
Phil
Is anyone owning this issue since Phil has been out?
I'm back tomorrow! 🎉 I'll try to take a look when I can. It'll likely still be a little while, so if anyone fancies having a go at what I proposed above then please go ahead 👍🏻 (just make a comment here first so that we don't duplicate work).
This is fixed with https://github.com/ewels/MultiQC/pull/1626 :)