MultiQC Add %Q0 in the barplot from samtools stats module

Add %Q0 in the barplot from samtools stats module

Open RxLoutre opened this issue 2 years ago • 1 comments

Description of feature

Hello everyone,

I would like to add a bit more option to the bar plot from samtools stats module. For our purposes, it would make a lot of sense if we could see quicker the info about the % of Q0 mapped reads in the barplot. Maybe as a subcategory of mapped ? Like still a blue color to keep the binary vision, but a different kind of blue color so we can distinguish Q0 from others ?

I am willing to try to develop this feature but I want to make sure it makes sense for others too :)

Let me know if more information is needed !

Best regards,

Roxane

Apr 25 '22 07:04 RxLoutre

Figured out this can be done by just adding a new line in the stats module like so :

def alignment_chart(data):
    """Make the HighCharts HTML to plot the alignment rates """
    keys = OrderedDict()
    keys['reads_mapped'] = {'color': '#437BB1', 'name': 'Mapped'}
    keys['reads_MQ0'] = {'color': '#FF9933', 'name': 'MQ0'}
    keys['reads_unmapped'] = {'color': '#B1084C', 'name': 'Unmapped'}

    # Config for the plot
    plot_conf = {
        'id': 'samtools_alignment_plot',
        'title': 'Samtools stats: Alignment Scores',
        'ylab': '# Reads',
        'cpswitch_counts_label': 'Number of Reads'
    }
    return bargraph.plot(data, keys, plot_conf)

Apr 25 '22 09:04 RxLoutre

Hi Roxane @RxLoutre,

Thank you for the suggestion! Do you have an example of samtools stats outputs with a non-zero MQ0 to test it?

The idea of a barplot is that the read counts there should sum up to the total number of reads. Unfortunately reads_mapped + reads_unmapped + reads_MQ0 do not sum up to sequences, however, we can split up the mapped reads number and use reads_mapped – reads_MQ0 + reads_unmapped + reads_MQ0 as bar keys. But would need to some test example to make sure it works properly.

Oct 16 '23 08:10 vladsavelyev

Hi @vladsavelyev !

Happy you are picking this up !

I wasn't too sure my solution was elegant, so I never pushed it on github. But this is how I added the MQ0 reads in the stats.py of the module samtools :

#At the begining of the module, I added a line to make a new entry in the dictionnary with existing entries. Just add the line below my comment in the correct place.
self.samtools_stats = dict()
for f in self.find_log_files("samtools/stats"):
    parsed_data = dict()
    for line in f["f"].splitlines():
        if not line.startswith("SN"):
            continue
        sections = line.split("\t")
        field = sections[1].strip()[:-1]
        field = field.replace(" ", "_")
        print("Field : {}".format(field))
        value = float(sections[2].strip())
        parsed_data[field] = value
    #***********************Added by ROX************************
    parsed_data["usefull_reads"] = parsed_data["reads_mapped"] - parsed_data["reads_MQ0"]
    parsed_data["useless_reads"] = parsed_data["reads_mapped"] - parsed_data["usefull_reads"]

    if len(parsed_data) > 0:
               # Work out some percentages
        if "raw_total_sequences" in parsed_data:
            for k in list(parsed_data.keys()):
                if (
                    k.startswith("reads_")
                    and k != "raw_total_sequences"
                    and parsed_data["raw_total_sequences"] > 0
                ):
                    parsed_data["{}_percent".format(k)] = (
                        parsed_data[k] / parsed_data["raw_total_sequences"]
                    ) * 100

And then further down :


#At the begining of the module, I added a line to make a new entry in the dictionnary with existing entries. Just add the line below my comment in the correct place.
self.samtools_stats = dict()
for f in self.find_log_files("samtools/stats"):
    parsed_data = dict()
    for line in f["f"].splitlines():
        if not line.startswith("SN"):
            continue
        sections = line.split("\t")
        field = sections[1].strip()[:-1]
        field = field.replace(" ", "_")
        print("Field : {}".format(field))
        value = float(sections[2].strip())
        parsed_data[field] = value
    #***********************Added by ROX************************
    parsed_data["usefull_reads"] = parsed_data["reads_mapped"] - parsed_data["reads_MQ0"]
    parsed_data["useless_reads"] = parsed_data["reads_mapped"] - parsed_data["usefull_reads"]

    if len(parsed_data) > 0:
               # Work out some percentages
        if "raw_total_sequences" in parsed_data:
            for k in list(parsed_data.keys()):
                if (
                    k.startswith("reads_")
                    and k != "raw_total_sequences"
                    and parsed_data["raw_total_sequences"] > 0
                ):
                    parsed_data["{}_percent".format(k)] = (
                        parsed_data[k] / parsed_data["raw_total_sequences"]
                    ) * 100

Oct 16 '23 09:10 RxLoutre

Don't mind the wording I chose there (usefull reads etc), I did it for a custom request, but this is just an example on how I extracted and then displayed the MQ0 :)

Oct 16 '23 09:10 RxLoutre

@RxLoutre, have you got a samtools stats output with a non-zero reads_MQ0 value? Can you attach it to a comment, please?

Oct 16 '23 09:10 vladsavelyev

Sorry I don't have anymore at this moment :(

Oct 16 '23 09:10 RxLoutre

MultiQC MultiQC copied to clipboard

Add %Q0 in the barplot from samtools stats module

Description of feature

MultiQC
MultiQC copied to clipboard