m6anet icon indicating copy to clipboard operation
m6anet copied to clipboard

Is fastq file converted from "Dorado" unsorted bam by "samtools fastq" suitable for m6anet

Open Tesdhi opened this issue 8 months ago • 2 comments

Hi Matthias (tagging you here @mocherry),

  1. You can pass the --emit-fastq flag to dorado basecaller, which would emit a fastq file, this is sufficient for downstream running nanopolish and m6anet

  2. You can use minimap2 and samtools to get a sorted.bam file:


minimap2 -ax map-ont -uf -t 3 --secondary=no <MMI> <PATH/TO/FASTQ.GZ> > <PATH/TO/SAM> 2>> <PATH/TO/SAM_LOG>

samtools view -Sb <PATH/TO/SAM> > <PATH/TO/BAM>

samtools sort <PATH/TO/BAM> -o <PATH/TO/SORTED.BAM> 

samtools index <PATH/TO/BAM>

  1. You can then use the fastq file and the fast5 files (or convert the pod5 files to fast5 files with pod5 convert to_fast5 and run nanopolish index

  2. Then, you can run nanopolish eventalign with the fast5, fastq, and sorted.bam, which will give you an eventalign.txt file to input to m6anet dataprep.

Not sure whether you are open to using command line, but you can check out the nf-core/nanoseq, which does all the steps for you.

Thanks!

Best wishes,

Yuk Kei

Originally posted by @yuukiiwa in #155

Hello Yuk Kei!

For the step1 you mentioned about getting the "fastq" file by "dorado",

I am wondering whether it is an alternative way to use "samtools fastq" to convert the unsorted bam file (produced by "dorado basecaller") to "fastq" file as the input "fastq" for "nanopolish eventalign"?

Thank you!

Tesdhi avatar Apr 24 '25 09:04 Tesdhi

Hi @Tesdhi,

I think using samtools fastq to get the fastq file should work as input for f5c eventalign (preferred) or nanopolish eventalign.

Just a reminder, you will need to samtools sort and index prior to running eventalign:

samtools sort myfile.sam -o myfile_sorted.bam && samtools index myfile_sorted.bam 

Thanks!

Best wishes, Yuk Kei

yuukiiwa avatar Apr 27 '25 13:04 yuukiiwa

Hello Yuk Kei!

Thank you for your help! I have got the final output. I have got another question about the output. I have checked the "probability_modified" from the "data.indiv_proba" and found the "probability_modified" from the "data.site_proba" is not calculated by adding all the "probability_modified" from the "data.indiv_proba". So my question is, how is the "probability_modified" from the "data.site_proba" calculated for each site?

Thank you!

Tesdhi avatar Jun 09 '25 09:06 Tesdhi