squigulator icon indicating copy to clipboard operation
squigulator copied to clipboard

Support for DNA methylation signal

Open xies4 opened this issue 2 years ago • 6 comments

Thank you for developing this amazing tool. I am wondering if there is a way to simulate reads with DNA methylation signals. If not, do you have any plan to include the function in the near future? Thanks again.

xies4 avatar Dec 15 '23 03:12 xies4

Hello,

This is something that is straightforward to implement given we already have methylation current level tables in f5c, yet, haven't found time to really implement it. Given that you asked now, I can get into implementing this sometime in the coming months.

hasindu2008 avatar Dec 15 '23 03:12 hasindu2008

Great! Looking forward to it! Thanks.

xies4 avatar Dec 15 '23 03:12 xies4

Hi, I am looking forward to it as well!

HLHsieh avatar Jan 29 '24 18:01 HLHsieh

Implementing support for this new RNA004 kit to squigulator. Will do the methylation as soon as that is finished.

hasindu2008 avatar Jan 30 '24 12:01 hasindu2008

@HLHsieh and @xies4 Thanks for your patience. A very preliminary methylation simulation feature has been now implemented under the "meth" branch at https://github.com/hasindu2008/squigulator/tree/meth. Note that I have yet tested only using a tiny corona-virus genome and there could be bugs/issues. I am planning to do some testing in the next few weeks, however, if you are interested you can give it a try.

You need to create a 3-column tsv file to have the chr, 0-based position and methylation frequency. For example:

MN908947.3	99	1.0
MN908947.3	1184	0.0
MN908947.3	1456	0.5
MN908947.3	20031 	0.7

Note that they must be tab separated. This example file is in test/methfreq.tsv

If you have a f5c/nanopolish methylation frequency TSV, you can use the following command to generate the necessary file for squigulator:

tail -n+2 methfreq.tsv | cut -f 1,2,7 > sqmeth.tsv

Then you can provide this file to squigulator like below:

./squigulator -x dna-r9-min -n 4000 -o a.blow5 test/nCoV-2019.reference.fasta --meth-freq test/methfreq.tsv

I tested this generated BLOW5 by basecalling (using buttery-eel) and running f5c:

eel -x cuda:all -i a.blow5 -o a.fastq --config dna_r9.4.1_450bps_sup.cfg
minimap2 -ax map-ont test/nCoV-2019.reference.fasta a.fastq  --secondary=no | samtools sort - -o a.bam && samtools index a.bam
f5c index --slow5 a.blow5 a.fastq
f5c call-methylation --slow5 a.blow5 -r a.fastq -b a.bam -g test/nCoV-2019.reference.fasta -o a.tsv
f5c meth-freq -i a.tsv -s -o afreq.tsv

The frequencies in afreq.tsv quite matched the ones in test/methfreq.tsv.

hasindu2008 avatar Feb 28 '24 09:02 hasindu2008

Thank you @hasindu2008

xies4 avatar Mar 01 '24 12:03 xies4

This is now merged to main and is available in the latest release

hasindu2008 avatar Sep 20 '24 12:09 hasindu2008