Bracken icon indicating copy to clipboard operation
Bracken copied to clipboard

Read length for nanopore

Open mbhall88 opened this issue 6 years ago • 11 comments

Upfront, I know Bracken wasn't necessarily designed to run on nanopore data.

For the read length parameter how would you recommend setting this? Median read length, average, minimum (as in #30 ), or a hard threshold?

mbhall88 avatar Nov 28 '18 12:11 mbhall88

I honestly am hesitant to say as it could affect the results a bit. I have yet to test this on nanopore data where read lengths are so varied.

My gut says minimum read length. but I really would like to test this further before being certain.

jenniferlu717 avatar Nov 28 '18 19:11 jenniferlu717

@jenniferlu717

I'm currently facing a similar issue with IIumina HiSeq NGS read data with varied read length of 30-301Aa after QC (trimmomatic followed by FASTQC). Is this issue resolved or still in the face of some development. I could see in the readme.md, bracken easy version has some ways to tackle reads with multiple read length (see link below). If this suits my requirement please confirm. https://github.com/jenniferlu717/Bracken#running-bracken-easy-version

Thanks in advance ,

Regards, Vijay N

narsapuramvijaykumar avatar Jan 28 '19 10:01 narsapuramvijaykumar

Facing the same problem here - we have a variety of sequencers generating anything from 150bp to 5kb (PacBio). I'm tempted to create two databases so that I can do chemistry-dependent analyses, but if the 150bp db would work for the longer reads, well, it would simplify handing this off to other folks. Any update on your tests, @jenniferlu717 ?

wolfgangrumpf avatar May 16 '19 15:05 wolfgangrumpf

From the paper we can know, length r is used to generate a database that the length of kmers is r, which is equal to the read length, then we can know how many k-mers are unique to genome Si. I am facing the same problems with you, but i still don't konw how to solve r, my read is from 150 to 300bp.

lancer-lu avatar Sep 02 '19 00:09 lancer-lu

@jenniferlu717

I'm currently facing a similar issue with IIumina HiSeq NGS read data with varied read length of 30-301Aa after QC (trimmomatic followed by FASTQC). Is this issue resolved or still in the face of some development. I could see in the readme.md, bracken easy version has some ways to tackle reads with multiple read length (see link below). If this suits my requirement please confirm. https://github.com/jenniferlu717/Bracken#running-bracken-easy-version

Thanks in advance ,

Regards, Vijay N

Have you solve this problem?

lancer-lu avatar Sep 02 '19 00:09 lancer-lu

Hi @jenniferlu717,

Similarly to the other folks posting here, I was wondering about what kind of read length I should build a database for. I'm analyzing a fairly diverse dataset where reads are 45, 75, or 100 bp long. Additionally, I will have to trim some of the reads even further due to poor quality. Do you recommend preparing and using different databases or one database based on the minimum length?

Thank you for your insights!

Midnighter avatar Nov 13 '20 15:11 Midnighter

I've been thinking a bit more about this and I'm actually wondering if Bracken is needed at all for long reads. I wonder if someone here has more experience because I would assume that with the long reads, kraken2 can match them quite specifically to one of the reference genomes. So I wonder if there is a need even to post-process with Bracken.

Midnighter avatar Nov 03 '22 15:11 Midnighter

I also have a diverse dataset with multiple read lengths. I'm thinking of setting to the minimum, but would appreciate any guidance.

pgcudahy avatar Feb 21 '23 09:02 pgcudahy

Please can I ask if you have had a chance to test this @jenniferlu717? It would be great to know if we can use Bracken with confidence for nanopore.

Many thanks,

Jack

jackwgoodall avatar Mar 27 '23 10:03 jackwgoodall

Hello @Midnighter, I wonder if you have any updates on this issue. I am analysing nanopore 16s data (minION) and already classified them with Kraken2. Is further processing with Bracken necessary? If yes, is the mimimum read length the optimal choice? If not, how would one calculate the relative taxonomic abundance with Kraken2 output?

Many thanks in advance!

iaposto avatar Apr 26 '23 09:04 iaposto

I don't have a real answer but I can say that we decided to not run Bracken on nanopore reads for taxprofiler.

Midnighter avatar Apr 27 '23 13:04 Midnighter