jetson_nanopore_sequencing icon indicating copy to clipboard operation
jetson_nanopore_sequencing copied to clipboard

Unblocked read to long on playback runs

Open JensUweUlrich opened this issue 2 years ago • 3 comments

Hi,

thanks for this great repository, helping me to setup adaptive sampling so easily. I have a NVIDIA AGX Xavier and I'm testing adaptive sampling using the bulk fast5 file provided by the readfish authors (http://s3.amazonaws.com/nanopore-human-wgs/bulkfile/PLSP57501_20170308_FNFAF14035_MN16458_sequencing_run_NOTT_Hum_wh1rs2_60428.fast5). I setup the guppyd service as described and also set the parameters according to the optimal values you provided in the wiki. Unfortunately, I'm always getting an N50 of unblocked reads of about 1.1kb , which is much too high. I would expect a value between 450 and 500 bases. Playing around with the basecall_server parameters (using for example the paramaters you recommended for the Nvidia NX) I could bring the N50 of unblocked reads down to 800 bases. But that's still not good enough. I just wanted to ask if you experienced the same issues when running adaptive sampling on a playback run with bulk fast5 files. (I know that playback runs are ressource hungry).

Best Jens

JensUweUlrich avatar Dec 06 '21 10:12 JensUweUlrich

Hi @JensUweUlrich - great to hear you're up and running on your Xavier AGX. In terms of adaptive sampling I'm guessing you may have missed the notes on the topic here? 😉

Long story short, adaptive sampling works perfectly on the Jetson Xavier boards when you run without live basecalling. I've reached out to ONT about this and they said they do have a fix coming for the Mk1c at some stage, so that should help us out as well.

It's really interesting, another person has told me they have adaptive sampling running with live basecalling, even in playback mode on a Xavier AGX. I haven't been able to replicate this at my end unfortunately. Its even more strange to me considering that ONT considered this a "bug" that that have a fix for...

So at this stage while we wait for the fix the "best" option is to run without live basecalling and then start a separate Guppy instance in the background.

sirselim avatar Dec 06 '21 17:12 sirselim

Hi @sirselim - I may have mixed things up in my post. Sorry for that. I disabled the live basecalling option within MinKNOW, because I read your detailed explanations about it. But even when I only have the basecaller dedicated for adaptive sampling running, using the playback will result in too much delay from the basecall_server. I played a bit around, using an additional laptop at home with the following setup: a) Linux laptop running readfish, b) windows laptop running minknow in playback mode, and c) AGX Xavier for fast basecalling. With this setup I am able to push unblocked read lengths to about 620 bp (when trying to deplete all human reads in the given bulk fast5 file). Still not optimal, but much better than 800 bp. I was just curious if you have any experiences with playback runs and adaptive sampling on the AGX Xavier.

JensUweUlrich avatar Dec 06 '21 20:12 JensUweUlrich

Hi @JensUweUlrich - thanks for the clarification. I was hoping the "easy fix" was to turn off basecalling!

Yeah, I've observed the same as you when running simulation mode on the AGX. I've also seen it on my big Linux workstation, which really shouldn't struggle at all. If you look at the Readfish github issue you'll see a big thread where we discuss this. It was closed without resolution unfortunately (link here).

I haven't tried simulation of a different data set, that's on my ever expanding list of things to do. Maybe there is something "odd" about the human data file provided via the Readfish repo...? As I mentioned previously, I've had people contact me saying they were happily doing adaptive sampling with real time basecalling on simulation data. I need to contact these people and see if I can get to the bottom of it.

I guess most importantly, I have run multiple real flowcells with adaptive sampling on the AGX and NX, and confirm that they work as expected (minus the live basecalling).

I'm still no closer to understand what is happening with the performance in simulation mode and its rather frustrating!

EDIT: I'd be interested to know what CPUs you've tested on and what performance you've observed.

sirselim avatar Dec 07 '21 04:12 sirselim