Bracken
Bracken copied to clipboard
The newest version should now work with krakenuniq databases.
There is still something funny going on. I have the latest bracken build in ~/build/Bracken and I have the executables as symlinks in ~/bin. They are in the path. For krakenuniq I am using a conda package version 0.6. When I had kraken and kraken2 installed with conda alongside krakenuniq I got this error:
>bracken-build -k 35 -l 75 -d /data/cjb/db_052422 -x /home/cjb/miniconda3/envs/biokraken/bin -y krakenuniq -t 32
>> Selected Options:
kmer length = 35
read length = 75
database = /data/cjb/db_052422
threads = 32
kraken type = krakenuniq
>> Checking for Valid Options...
ERROR: Kraken2 Database incomplete: /data/cjb/db_052422/hash.k2d does not exist
When I uninstalled kraken and kraken2 leaving only krakenuniq, bracken-build started working:
> bracken-build -k 35 -l 75 -d /data/cjb/db_052422 -x /home/cjb/miniconda3/envs/biokraken/bin -y krakenuniq -t 32 >> Selected Options:
kmer length = 35
read length = 75
database = /data/cjb/db_052422
threads = 32
kraken type = krakenuniq
>> Checking for Valid Options...
>> Creating database.kraken [if not found]
database.kraken.tsv exists, skipping creation....
Finished creating database.kraken [in DB folder]
>> Creating database75mers.kmer_distrib
>>STEP 0: PARSING COMMAND LINE ARGUMENTS
Taxonomy nodes file: /data/cjb/db_052422/taxonomy/nodes.dmp
Seqid file: /data/cjb/db_052422/seqid2taxid.map
Num Threads: 32
Kmer Length: 35
Read Length: 75
>>STEP 1: READING SEQID2TAXID MAP
64663 total sequences read
>>STEP 2: READING NODES.DMP FILE
2422769 total nodes read
>>STEP 3: CONVERTING KMER MAPPINGS INTO READ CLASSIFICATIONS:
75mers, with a database built using 35mers
etc.
That's as far as I have gotten so far.
Subsequently, bracken ran cleanly using that database on krakenuniq reports.
I'm using Krakenuniq with the pre-built database downloaded from https://benlangmead.github.io/aws-indexes/k2, the 384G one labelled as EuPathDB48, to generate the report file.
Bracken was installed by the install shell rather than via conda
when I ran the program, it said
bracken -d ~/krakenuniq -i new_report.tsv -o new_bracken -w new_bracken_report -r 50 -l S -t 0
Checking for Valid Options... Running Bracken >> python src/est_abundance.py -i new_report.tsv -o new_bracken -k /hdd1/home/f22_yfeng/krakenuniq/database50mers.kmer_distrib -l S -t 0 PROGRAM START TIME: 10-13-2022 11:57:37 Checking report file: new_report.tsv Traceback (most recent call last): File "/hdd1/home/f22_yfeng/Bracken/src/est_abundance.py", line 554, in
main() File "/hdd1/home/f22_yfeng/Bracken/src/est_abundance.py", line 339, in main [mapped_taxid, mapped_taxid_dict] = process_kmer_distribution(line,lvl_taxids,map2lvl_taxids) File "/hdd1/home/f22_yfeng/Bracken/src/est_abundance.py", line 100, in process_kmer_distribution [g_taxid,mkmers,tkmers] = genome_str.split(':') ValueError: not enough values to unpack (expected 3, got 1)
I'm wondering how can I fix this ?
Many thanks
@phlatphish Can you try running without the -x flag?
Otherwise, i did fix the script. I accidentally left kraken2 as the default - ignoring the -y flag - when specifying -x. But the newest version does fix this
@fengyuchengdu can you open a new issue? I think your kmer_distribution file might be wrong. I need to see the kmer_distribution file you downloaded/are you using