Bracken
Bracken copied to clipboard
Segmentation fault (core dumped) in converting sequence
Dear All,
I have created a customized Kraken2 database and I wanted to run Bracken on that. I got an error message:
Loading database information... done. 948450 sequences (82820.37 Mbp) processed in 2235.571s (25.5 Kseq/m, 2222.80 Mbp/m). 948316 sequences classified (99.99%) 134 sequences unclassified (0.01%) 0 sequences converted... 1 sequences converted (finished: NC_032425.1) 2 sequences converted (finished: NC_048658.1) [...] 477518 sequences converted (finished: NZ_WNNK01000096.1)1 477519 sequences converted (finished: NZ_WNNK01000097.1) /software/UHTS/Analysis/braken/2.6.2/bin/bracken-build: line 170: 71821 Segmentation fault (core dumped) kmer2read_distr --seqid2taxid $DATABASE/seqid2taxid.map --taxonomy $DATABASE/taxonomy/ --kraken $DATABASE/database.kraken --output $DATABASE/database${READ_LEN}mers.kraken -k ${KMER_LEN} -l ${READ_LEN} -t ${THREADS}
However, Bracken does not stop at this step and still produces a resulting file: database100mers.kraken, which in reality appears truncked at the end. tail database100mers.kraken NZ_WNNK01000077.1 1055469 1055469:526 NZ_WNNK01000078.1 1055469 1055469:526 NZ_WNNK01000079.1
Has someone had such error? Any help?
Thanks
Stefano
Can you delete the database100mers.kraken file and try to recreate the file? It looks like the machine is using too much memory which caused the segmentation fault but I have not run into this issue myself.
Dear all,
I also have this problem with my customized Kraken2 database.
The kraken is run smoothly but I got some problem in building bracken database.
I tried many methods. bracken goes well with NCBI database but it doesn't work for my own database.
I used one genome as a test, it still have the same error, so I guess it may not be the memory issue. And kraken goes well with my whole own database. So I don't know the reason. Any help would be really grateful!!
Thanks Tianhui
Dear All,
I have created a customized Kraken2 database and I wanted to run Bracken on that. I got an error message:
Loading database information... done. 948450 sequences (82820.37 Mbp) processed in 2235.571s (25.5 Kseq/m, 2222.80 Mbp/m). 948316 sequences classified (99.99%) 134 sequences unclassified (0.01%) 0 sequences converted... 1 sequences converted (finished: NC_032425.1) 2 sequences converted (finished: NC_048658.1) [...] 477518 sequences converted (finished: NZ_WNNK01000096.1)1 477519 sequences converted (finished: NZ_WNNK01000097.1) /software/UHTS/Analysis/braken/2.6.2/bin/bracken-build: line 170: 71821 Segmentation fault (core dumped) kmer2read_distr --seqid2taxid DATABASE/seqid2taxid.map−−taxonomy{READ_LEN}mers.kraken -k KMERLEN−l{READ_LEN} -t ${THREADS}
However, Bracken does not stop at this step and still produces a resulting file: database100mers.kraken, which in reality appears truncked at the end. tail database100mers.kraken NZ_WNNK01000077.1 1055469 1055469:526 NZ_WNNK01000078.1 1055469 1055469:526 NZ_WNNK01000079.1
Has someone had such error? Any help?
Thanks
Stefano
did you solve this problem? I also got the same error.
I removed one genome from the database in the first bracken step where I had to list the genomes in my database. Such a genome was not detected by Kraken2 in my samples. After that it worked. Since my database was customized ( I basically downloaded all the genomes from NCBI), it might be that, there were some issues concerning this particular genome.
Best
Stefano
On Tue, 8 Nov 2022 at 03:59, litianhui333 @.***> wrote:
Dear All,
I have created a customized Kraken2 database and I wanted to run Bracken on that. I got an error message:
Loading database information... done. 948450 sequences (82820.37 Mbp) processed in 2235.571s (25.5 Kseq/m, 2222.80 Mbp/m). 948316 sequences classified (99.99%) 134 sequences unclassified (0.01%) 0 sequences converted... 1 sequences converted (finished: NC_032425.1) 2 sequences converted (finished: NC_048658.1) [...] 477518 sequences converted (finished: NZ_WNNK01000096.1)1 477519 sequences converted (finished: NZ_WNNK01000097.1) /software/UHTS/Analysis/braken/2.6.2/bin/bracken-build: line 170: 71821 Segmentation fault (core dumped) kmer2read_distr --seqid2taxid DATABASE/seqid2taxid.map−−taxonomy{READ_LEN}mers.kraken -k KMERLEN−l{READ_LEN} -t ${THREADS}
However, Bracken does not stop at this step and still produces a resulting file: database100mers.kraken, which in reality appears truncked at the end. tail database100mers.kraken NZ_WNNK01000077.1 1055469 1055469:526 NZ_WNNK01000078.1 1055469 1055469:526 NZ_WNNK01000079.1
Has someone had such error? Any help?
Thanks
Stefano
did you solve this problem? I also got the same error.
— Reply to this email directly, view it on GitHub https://github.com/jenniferlu717/Bracken/issues/189#issuecomment-1306554262, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYKLEKPAQVXYBXKI22KNFXLWHG6XZANCNFSM5XMTYDXA . You are receiving this because you authored the thread.Message ID: @.***>
Hey. I also had this issue with a custom kraken database that I built. Wanted to share what I found to be the issues and how I solved them.
Issue 1 Unmapped Sequences: As noted by other users in other threads pertaining to this seg fault, the presence of unmapped sequences in the database will cause this. The kraken-build script will tell you when this occurs and spit out a file called unmapped.txt that consists of header names for unmapped seqs in your fasta file. I removed some of these that I decided were from genomes not needed and kept others. The ones from genomes I wanted, I added the seq accession id and taxid (obtained manually from the genome accession from NCBI website) to the kraken_db/taxonomy/nucl_gb.accession2taxid file (gi can be na). From there, you need to make sure that taxid is present in the first column of the nodes.dmp file. If it is not see the second issue I found caused a seg fault. Once that is edited, rebuild the kraken database with these new modified taxonomy files. And re-try bracken-build. (maybe check for issue 2 below first!)
Issue 2 Missing Taxids: Hopefully my pain is your gain here as this was a headache and a half to find. The kraken_db/seqid2taxid.map contains the mapped sequences for the database. Every taxid in the second column of this file must be present in the first column of the kraken_db/taxonomy/nodes.dmp file or it will cause the segmentation fault. I had some mapped taxids that were missing in the first column of the nodes.dmp file. This second column of this file is the parent taxid. You can manually edit this file to add the parent ids obtained from NCBI here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. This is a silly python script I used to check for this issue, it will print the sequences in seq2taxid.map that are missing from the nodes.dmp file. I edited this to fill in the first 3 columns and then left the remaining columns blank. Make sure you use a tab ('\t') as the column separator editing this file.
#!/usr/bin/env python3.6
childs = set()
seq_taxids = set()
for line in open('taxonomy/nodes.dmp'):
childs.add(line.strip().replace('\t','').split('|')[0])
for line in open('seqid2taxid.map'):
seq_taxids.add(line.strip().split('\t')[1])
print('Len childs:', len(childs))
print('Len seq_taxids:', len(seq_taxids))
print('Len seq_taxids & childs:', len(seq_taxids & childs))
print('Seq_taxids - childs:\n', seq_taxids - childs)
Hope this helps someone out there slamming their mouse against the wall trying to troubleshoot this. If this program is still supported, maybe consider a pre-check for these issues before running the sub-scripts in bracken-build?