nanopolish icon indicating copy to clipboard operation
nanopolish copied to clipboard

How to handle live basecall fast5 files (pass/fail) with rebasecalled 'hac' fastq files.

Open callumparr opened this issue 4 years ago • 0 comments

I recently ran quite a few DRS libraries and want run poly-a analysis using nanopolish. I'd say its only worth running on PASS reads. However, I am running into the issue that the original raw fast5 was organized into PASS or FAIL based on the concurrent live basecall (fast model). I rebasecall using guppy high accuracy config, feeding in both PASS and FAIL. I delete the old FASTQ as I was only interested in live-bascall to monitor things like translocation speed and q-score metrics during the run.

MinKNOW outputs following using live basecall (fast mode) Experiment1/ fast5_pass/ file1.fast5...

fast5_fail/ ile1.fast5... FASTQ_pass file.fastq...

FASTQ_fail/ file.fastq...

Then both fast5 directories used for guppy

hac_output FASTQ_pass/ file1.fast5..

FASTQ_fail/ file.fast5...

sequence_summary.txt sequence_telemetry.txt

The problem now is that I don't really have a direct correspondence in files between the original fast5 files and the new guppy output FASTQ. A lot of reads in fast5 fail folder will of course now pass the qscore cut off when rebasecalling with guppy hac.

Is it OK to create nanopolish index using only the FASTQ PASS reads but using both pass and fail reads as the directory so I can assure the index can match the fastq read to is fast5 signal? Using the sequencing_summary.txt file from guppy output not MinKNOW.

I assume failed fast5 will always be excluded or filtered out later from analysis?

callumparr avatar Mar 26 '20 04:03 callumparr