Is fastq file converted from "Dorado" unsorted bam by "samtools fastq" suitable for m6anet
Hi Matthias (tagging you here @mocherry),
You can pass the
--emit-fastqflag todorado basecaller, which would emit a fastq file, this is sufficient for downstream runningnanopolishandm6anetYou can use minimap2 and samtools to get a sorted.bam file:
minimap2 -ax map-ont -uf -t 3 --secondary=no <MMI> <PATH/TO/FASTQ.GZ> > <PATH/TO/SAM> 2>> <PATH/TO/SAM_LOG> samtools view -Sb <PATH/TO/SAM> > <PATH/TO/BAM> samtools sort <PATH/TO/BAM> -o <PATH/TO/SORTED.BAM> samtools index <PATH/TO/BAM>
You can then use the
fastqfile and thefast5files (or convert thepod5files tofast5files withpod5 convert to_fast5and runnanopolish indexThen, you can run
nanopolish eventalignwith thefast5,fastq, andsorted.bam, which will give you aneventalign.txtfile to input tom6anet dataprep.Not sure whether you are open to using command line, but you can check out the
nf-core/nanoseq, which does all the steps for you.Thanks!
Best wishes,
Yuk Kei
Originally posted by @yuukiiwa in #155
Hello Yuk Kei!
For the step1 you mentioned about getting the "fastq" file by "dorado",
I am wondering whether it is an alternative way to use "samtools fastq" to convert the unsorted bam file (produced by "dorado basecaller") to "fastq" file as the input "fastq" for "nanopolish eventalign"?
Thank you!
Hi @Tesdhi,
I think using samtools fastq to get the fastq file should work as input for f5c eventalign (preferred) or nanopolish eventalign.
Just a reminder, you will need to samtools sort and index prior to running eventalign:
samtools sort myfile.sam -o myfile_sorted.bam && samtools index myfile_sorted.bam
Thanks!
Best wishes, Yuk Kei
Hello Yuk Kei!
Thank you for your help! I have got the final output. I have got another question about the output. I have checked the "probability_modified" from the "data.indiv_proba" and found the "probability_modified" from the "data.site_proba" is not calculated by adding all the "probability_modified" from the "data.indiv_proba". So my question is, how is the "probability_modified" from the "data.site_proba" calculated for each site?
Thank you!