Bracken
Bracken copied to clipboard
est_abundance.py, ValueError: '1' is not in list
Hi all,
I downloaded the Minikraken indexes from https://genome-idx.s3.amazonaws.com/kraken/minikraken2_v2_8GB_201904.tgz which contains a Kraken 2 database along with Bracken databases built for 100, 150, and 200-mers. When it comes to running bracken bracken -d ${KRAKEN_DB} -i ${SAMPLE}.kreport -o ${SAMPLE}.bracken -r ${READ_LEN} -l ${CLASSIFICATION_LEVEL} -t ${THRESHOLD}
it gives the following error:
Checking for Valid Options... Running Bracken >> python src/est_abundance.py -i ZZ_unmappedreads_report_alltaxa -o ZZ_unmapped_alltaxa.bracken -k minikraken2_v2_8GB_201904_UPDATE/database100mers.kmer_distrib -l S -t 10 PROGRAM START TIME: 07-25-2023 19:18:32 Checking report file: ZZ_unmappedreads_report_alltaxa Traceback (most recent call last): File "/opt/ohpc/pub/spack/opt/spack/linux-rocky8-broadwell/gcc-12.2.0/bracken-2.7-zyb52vlyksuxgzxlui23b3vfvapvsluf/bin/src/est_abundance.py", line 534, in
main() File "/opt/ohpc/pub/spack/opt/spack/linux-rocky8-broadwell/gcc-12.2.0/bracken-2.7-zyb52vlyksuxgzxlui23b3vfvapvsluf/bin/src/est_abundance.py", line 303, in main elif main_lvls.index(level_id[0]) >= branch_lvl: ValueError: '1' is not in list
Update:
I installed the Bracken new version (Bracken-2.8) and re-run the above command. This time it gave a different error:
Checking for Valid Options... Running Bracken >> python src/est_abundance.py -i ZZ_unmappedreads_report_alltaxa -o ZZ_unmapped_alltaxa.bracken -k minikraken2_v2_8GB_201904_UPDATE/database100mers.kmer_distrib -l S -t 0 PROGRAM START TIME: 07-25-2023 20:14:11 Checking report file: ZZ_unmappedreads_report_alltaxa Traceback (most recent call last): File "./Bracken-2.8/src/est_abundance.py", line 554, in
main() File "./Bracken-2.8/src/est_abundance.py", line 323, in main elif main_lvls.index(level_id[0]) >= branch_lvl: IndexError: string index out of range
Could anyone please help me fix this error?
Thanks
Same error, but using different databases.
I tried the same command on a different input kraken report that had more data in it and got no error, so this may be an issue that arises when reports have few taxa/little information ??
I got the same error as well. Is there an ETA on the fix?
@lauren-mak @JHarrisonEcoEvo @ChathumadaviE can you email me your report files ([email protected])
Has there been any update to this by chance?
I'm using Bracken 2.9 and generated reports with kraken2 2.0.8-beta (I believe) using the following commands. They're part of a loop as I have many files to analyze but Bracken gives the same error using a single report file at command line with defined variables.
$kraken2 --db ./"$db" --output $out $val --threads "$threads" --report "$report"
$bracken-build -d ./"$db" -t "$threads" -k "$kmerLgth" -l "$rdLgth" -x <brackenLoc>
$bracken -d ./"$db" -i "$var" -o "$var2" -r "$rdLgth" -t 10
Kraken2 report is something like the following. I've changed root and bacteria to have - and D respectively in the fourth column only to get the same error. 61.41 112556 112556 U 0 unclassified 38.59 70738 0 1 root 38.59 70738 10 1 131567 cellular organisms 38.31 70226 27 2759 Eukaryota 38.22 70049 0 K 33090 Viridiplantae 38.22 70049 0 P 35493 Streptophyta 38.22 70049 0 P1 131221 Streptophytina 38.22 70049 0 P2 3193 Embryophyta 38.22 70049 0 P3 58023 Tracheophyta 38.22 70049 0 P4 78536 Euphyllophyta 38.22 70049 0 P5 58024 Spermatophyta 38.22 70049 0 C 3398 Magnoliopsida 38.22 70049 0 C1 1437183 Mesangiospermae 38.22 70049 0 C2 71240 eudicotyledons 38.22 70049 0 C3 91827 Gunneridae 38.22 70049 0 C4 1437201 Pentapetalae 38.22 70049 0 C5 71275 rosids 38.22 70049 0 C6 91835 fabids 38.22 70049 0 O 3502 Fagales 38.22 70049 0 F 16714 Juglandaceae 38.22 70049 0 G 13402 Carya 38.22 70049 70049 S 32201 Carya illinoinensis 0.08 150 0 D1 33154 Opisthokonta 0.08 150 0 K 4751 Fungi 0.08 150 2 K1 451864 Dikarya 0.06 110 0 P 4890 Ascomycota ...
Checking report file: 16d13MX5d4_unalingedFin_.kreport Traceback (most recent call last): File "/project/pecan_scab_gwas/bracken/bin/src/est_abundance.py", line 554, in
main() File "/project/pecan_scab_gwas/bracken/bin/src/est_abundance.py", line 323, in main elif main_lvls.index(level_id[0]) >= branch_lvl: IndexError: string index out of range
I did not see Eukaryota is missing a value in the forth column as well. Bracken seems to run normally when I use D there.
Ugg, if you have hundreds of files to fix the following script might be useful. It replaces the missing classification of root, Bacteria, and Eukaryota of all kreports in the CWD with -,D,D respectively. Modify for your given database accordingly.
for val in *.kreport
do
fix=$(echo "$val"|sed -r 's/^(.+).kreport/\1_corrected.kreport/g') sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)(root)$/\1\2\3-\t\5\6/g' "$val" | sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\s+)(Bacteria)$/\1\2\3D\t\5\6/g' | sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]*\s+)(Eukaryota)$/\1\2\3D\t\5\6/g' > "$fix"
done
I realized that it is an issue with some of the major kingdoms not having a level, which is strange to me. I'm working on trying to fix this.
Would you happen to have a fix for this now? I get the same error on running Bracken with a custom-made Kraken2 database. I also could trace it back to the major taxa level not having the level specified.
Here are those lines with the issue:
4.59 10735 10735 U 0 unclassified 95.41 223140 170 1 root 95.31 222904 923 1 131567 cellular organisms 91.20 213289 5373 2 Bacteria 3.71 8676 172 2759 Eukaryota 0.01 16 0 2157 Archaea 0.02 49 0 10239 Viruses
May I also send you my Kraken2 reports to check? Unfortunately, this solution of code is not working for me.
Ugg, if you have hundreds of files to fix the following script might be useful. It replaces the missing classification of root, Bacteria, and Eukaryota of all kreports in the CWD with -,D,D respectively. Modify for your given database accordingly.
for val in *.kreport
do
fix=$(echo "$val"|sed -r 's/^(.+).kreport/\1_corrected.kreport/g') sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)(root)$/\1\2\3-\t\5\6/g' "$val" | sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\s+)(Bacteria)$/\1\2\3D\t\5\6/g' | sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]*\s+)(Eukaryota)$/\1\2\3D\t\5\6/g' > "$fix"
done
Any help would be appreciated. Thank you :)