Bracken
Bracken copied to clipboard
Output full lineage information
Hi, I am wondering if bracken can output full lineage information identified like "Bacteria; Proteobacteria; Gammaproteobacteria; Burkholderiales; Rhodocyclaceae; UTPRO2; UTPRO2_sp1234r4"? Thank you
Opps sorry, seems kreport2mpa.py will do the work. However, this script is not included in version 2.6.2, there was one in version 2.6.0.
Still a little bit tricky, would it be possible to output upper-level lineage information as well?
For example, when dealing with Genus level, output Domain, Phylum, Class, Order, Family as well?
Doing the following worked for me. Essentially, I took the sample_braken.report
file from BRACKEN's output and ran the following:
#########################
### MPA-style report ###
rule mpa_report:
input:
report=os.path.join(RESULTS_DIR, "bracken/{sid}_bracken.report")
output:
mpa=os.path.join(RESULTS_DIR, "mpa_report/{sid}_mpa.tsv")
conda:
os.path.join(ENV_DIR, "bracken_new.yaml")
log:
os.path.join(RESULTS_DIR, "logs/mpa_{sid}.log")
message:
"Creating mpa-style report for {wildcards.sid}"
shell:
"(date && kreport2mpa.py -r {input.report} -o {output.mpa} && date) &> >(tee {log})"
rule combine_mpa:
input:
mpa=expand(os.path.join(RESULTS_DIR, "mpa_report/{sid}_mpa.tsv"), sid=SAMPLES)
output:
combined=os.path.join(RESULTS_DIR, "mpa_report/combined_mpa.tsv")
conda:
os.path.join(ENV_DIR, "krakentools.yaml")
params:
combine=os.path.join(SRC_DIR, "combine_mpa_modified.py")
log:
os.path.join(RESULTS_DIR, "logs/mpa_combine.log")
message:
"Creating a combined mpa-style report"
shell:
"(date && {params.combine} -i {input.mpa} -d $(dirname {output.combined}) && date) &> >(tee {log})"
I used a modified script to keep the header names in the combined output file as described here: https://github.com/jenniferlu717/KrakenTools/issues/81
Please do check for any errors or quirks though, and hope it helps!