Bracken icon indicating copy to clipboard operation
Bracken copied to clipboard

Output full lineage information

Open lijierr opened this issue 3 years ago • 3 comments

Hi, I am wondering if bracken can output full lineage information identified like "Bacteria; Proteobacteria; Gammaproteobacteria; Burkholderiales; Rhodocyclaceae; UTPRO2; UTPRO2_sp1234r4"? Thank you

lijierr avatar Aug 23 '21 10:08 lijierr

Opps sorry, seems kreport2mpa.py will do the work. However, this script is not included in version 2.6.2, there was one in version 2.6.0.

lijierr avatar Aug 23 '21 10:08 lijierr

Still a little bit tricky, would it be possible to output upper-level lineage information as well?

For example, when dealing with Genus level, output Domain, Phylum, Class, Order, Family as well?

lijierr avatar Sep 03 '21 11:09 lijierr

Doing the following worked for me. Essentially, I took the sample_braken.report file from BRACKEN's output and ran the following:

#########################
### MPA-style report ###
rule mpa_report:
    input:
        report=os.path.join(RESULTS_DIR, "bracken/{sid}_bracken.report")
    output:
        mpa=os.path.join(RESULTS_DIR, "mpa_report/{sid}_mpa.tsv")
    conda:
        os.path.join(ENV_DIR, "bracken_new.yaml")
    log:
        os.path.join(RESULTS_DIR, "logs/mpa_{sid}.log")
    message:
        "Creating mpa-style report for {wildcards.sid}"
    shell:
        "(date && kreport2mpa.py -r {input.report} -o {output.mpa} && date)  &> >(tee {log})"

rule combine_mpa:
    input:
        mpa=expand(os.path.join(RESULTS_DIR, "mpa_report/{sid}_mpa.tsv"), sid=SAMPLES)
    output:
        combined=os.path.join(RESULTS_DIR, "mpa_report/combined_mpa.tsv")
    conda:
        os.path.join(ENV_DIR, "krakentools.yaml")
    params:
        combine=os.path.join(SRC_DIR, "combine_mpa_modified.py")
    log:
        os.path.join(RESULTS_DIR, "logs/mpa_combine.log")
    message:
        "Creating a combined mpa-style report"
    shell:
        "(date && {params.combine} -i {input.mpa} -d $(dirname {output.combined}) && date)  &> >(tee {log})"

I used a modified script to keep the header names in the combined output file as described here: https://github.com/jenniferlu717/KrakenTools/issues/81

Please do check for any errors or quirks though, and hope it helps!

susheelbhanu avatar Sep 01 '23 12:09 susheelbhanu