Bracken icon indicating copy to clipboard operation
Bracken copied to clipboard

`Error: no reads found` but the report file contains read counts

Open MatteoSchiavinato opened this issue 3 years ago • 1 comments

I'm using bracken in a pipeline, and I'm encountering some issues. I have used it before in the same system without any issue, so I'm a bit confused about this.

I'm running it on a report file that contains read counts produced by kraken2. It is not an mpa report, so it's the correct version.

With the following command:

python3.8 \
est_abundance.py \
-i test.report \
-o test.bracken \
-k database100mers.kmer_distrib \
-l S \
-t 0  

I'm getting the following error:

>> Checking report file: test.report
Error: no reads found. Please check your Kraken report

However the report contains several taxa each of which has counts. So I don't understand what the issue is.

I tried to modify the est_abundance.py script to debug it and I found that there are a few commented lines that switch certain commands (e.g. new_all_reads = float(all_reads) + float(added_reads)) with newer versions (e.g. new_all_reads = float(added_reads)).

If I restore the commented commands, removing their new versions, the program works but the output contains 0 reads for each taxa (each read count for each taxa gets subtracted all of its reads, becoming 0).

Could this be a situation where bracken could not estimate the read distribution, and hence no read was added or subtracted? And if so, can it handle such scenario or does it end up in a ZeroDivisionError or something like that?

MatteoSchiavinato avatar Jun 09 '21 15:06 MatteoSchiavinato

I have also had this issue. I have several hundred kraken reports I am processing, one of which routinely fails to be processed by Bracken.

Does your kraken report have a similar structure as mine? Note the F, F1, F2 and S rows. I think it is happening because there is no genus designation and it hops from Family to species with no in between - I think genus is required for the species level for some reason.

If i change the level flag from "S" (species) to "G" (genus), it still doesnt work (Genus is missing and maybe Species requires checking the genus designation first?), however if i change the level flag to "F" (Family) then i get a report successfully generated.

The error message could definitely be a bit more descriptive but I think its because (at least in my case) there is no genus designation which for some reason breaks the species level (and also obviously the genus level).

 2.08	1	1	U	0	unclassified
 97.92	47	0	R	1	root
 97.92	47	0	R1	131567	  cellular organisms
 97.92	47	0	D	2	    Bacteria
 97.92	47	0	P	1224	      Proteobacteria
 97.92	47	0	C	1236	        Gammaproteobacteria
 97.92	47	0	O	91347	          Enterobacterales
 97.92	47	3	F	543	            Enterobacteriaceae
 79.17	38	0	F1	191675	              unclassified Enterobacteriaceae
 79.17	38	0	F2	36866	                unclassified Enterobacteriaceae (miscellaneous)
 79.17	38	38	S	693444	                  Enterobacteriaceae bacterium strain FGI 57
  4.17	2	0	G	561	              Escherichia
  4.17	2	2	S	562	                Escherichia coli
  2.08	1	0	G	547	              Enterobacter
  2.08	1	1	S	1692238	                Enterobacter sp. FY-07
  2.08	1	0	G	590	              Salmonella
  2.08	1	1	S	28901	                Salmonella enterica
  2.08	1	0	G	570	              Klebsiella
  2.08	1	0	S	573	                Klebsiella pneumoniae
  2.08	1	1	S1	72407	                  Klebsiella pneumoniae subsp. pneumoniae
  2.08	1	0	G	544	              Citrobacter
  2.08	1	0	G1	1344959	                Citrobacter freundii complex
  2.08	1	1	S	546	                  Citrobacter freundii

maxlcummins avatar Jun 24 '21 01:06 maxlcummins