MultiQC
MultiQC copied to clipboard
Add %Q0 in the barplot from samtools stats module
Description of feature
Hello everyone,
I would like to add a bit more option to the bar plot from samtools stats module. For our purposes, it would make a lot of sense if we could see quicker the info about the % of Q0 mapped reads in the barplot. Maybe as a subcategory of mapped ? Like still a blue color to keep the binary vision, but a different kind of blue color so we can distinguish Q0 from others ?
I am willing to try to develop this feature but I want to make sure it makes sense for others too :)
Let me know if more information is needed !
Best regards,
Roxane
Figured out this can be done by just adding a new line in the stats module like so :
def alignment_chart(data):
"""Make the HighCharts HTML to plot the alignment rates """
keys = OrderedDict()
keys['reads_mapped'] = {'color': '#437BB1', 'name': 'Mapped'}
keys['reads_MQ0'] = {'color': '#FF9933', 'name': 'MQ0'}
keys['reads_unmapped'] = {'color': '#B1084C', 'name': 'Unmapped'}
# Config for the plot
plot_conf = {
'id': 'samtools_alignment_plot',
'title': 'Samtools stats: Alignment Scores',
'ylab': '# Reads',
'cpswitch_counts_label': 'Number of Reads'
}
return bargraph.plot(data, keys, plot_conf)
Hi Roxane @RxLoutre,
Thank you for the suggestion! Do you have an example of samtools stats outputs with a non-zero MQ0 to test it?
The idea of a barplot is that the read counts there should sum up to the total number of reads. Unfortunately reads_mapped
+ reads_unmapped
+ reads_MQ0
do not sum up to sequences
, however, we can split up the mapped reads number and use reads_mapped – reads_MQ0
+ reads_unmapped
+ reads_MQ0
as bar keys. But would need to some test example to make sure it works properly.
Hi @vladsavelyev !
Happy you are picking this up !
I wasn't too sure my solution was elegant, so I never pushed it on github. But this is how I added the MQ0 reads in the stats.py of the module samtools :
#At the begining of the module, I added a line to make a new entry in the dictionnary with existing entries. Just add the line below my comment in the correct place.
self.samtools_stats = dict()
for f in self.find_log_files("samtools/stats"):
parsed_data = dict()
for line in f["f"].splitlines():
if not line.startswith("SN"):
continue
sections = line.split("\t")
field = sections[1].strip()[:-1]
field = field.replace(" ", "_")
print("Field : {}".format(field))
value = float(sections[2].strip())
parsed_data[field] = value
#***********************Added by ROX************************
parsed_data["usefull_reads"] = parsed_data["reads_mapped"] - parsed_data["reads_MQ0"]
parsed_data["useless_reads"] = parsed_data["reads_mapped"] - parsed_data["usefull_reads"]
if len(parsed_data) > 0:
# Work out some percentages
if "raw_total_sequences" in parsed_data:
for k in list(parsed_data.keys()):
if (
k.startswith("reads_")
and k != "raw_total_sequences"
and parsed_data["raw_total_sequences"] > 0
):
parsed_data["{}_percent".format(k)] = (
parsed_data[k] / parsed_data["raw_total_sequences"]
) * 100
And then further down :
#At the begining of the module, I added a line to make a new entry in the dictionnary with existing entries. Just add the line below my comment in the correct place.
self.samtools_stats = dict()
for f in self.find_log_files("samtools/stats"):
parsed_data = dict()
for line in f["f"].splitlines():
if not line.startswith("SN"):
continue
sections = line.split("\t")
field = sections[1].strip()[:-1]
field = field.replace(" ", "_")
print("Field : {}".format(field))
value = float(sections[2].strip())
parsed_data[field] = value
#***********************Added by ROX************************
parsed_data["usefull_reads"] = parsed_data["reads_mapped"] - parsed_data["reads_MQ0"]
parsed_data["useless_reads"] = parsed_data["reads_mapped"] - parsed_data["usefull_reads"]
if len(parsed_data) > 0:
# Work out some percentages
if "raw_total_sequences" in parsed_data:
for k in list(parsed_data.keys()):
if (
k.startswith("reads_")
and k != "raw_total_sequences"
and parsed_data["raw_total_sequences"] > 0
):
parsed_data["{}_percent".format(k)] = (
parsed_data[k] / parsed_data["raw_total_sequences"]
) * 100
Don't mind the wording I chose there (usefull reads etc), I did it for a custom request, but this is just an example on how I extracted and then displayed the MQ0 :)
@RxLoutre, have you got a samtools stats
output with a non-zero reads_MQ0
value? Can you attach it to a comment, please?
Sorry I don't have anymore at this moment :(