methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

Specify additional "genomes" of spike-in controls for downstream calculation of bisulfite conversion efficiency

Open pdemko opened this issue 1 year ago • 7 comments

Description of feature

Library preparation kits come with unmethylated and methylated DNA controls that are added to samples (e.g. NEBNext EM-seq). It would be useful to supply methylseq with the fasta sequences of the controls and have it perform additional alignment, deduplication, and extraction using those references. This would help enable calculation of bisulfite/enzymatic conversion efficiency.

Thank you for providing this excellent pipeline!

pdemko avatar Apr 01 '24 19:04 pdemko

At the moment, we're basically including them as part of the main reference genome FASTA and analyzing outside of methylseq. Having a way to include them separately would also be useful so additional spike-in specific analyses could be performed.

cjfields avatar Apr 08 '24 21:04 cjfields

Hello everybody,

Since I also need this feature, I am willing to develop it as part of the methylseq pipeline. Please see this conversation in slack.

PanosProv avatar Nov 08 '24 13:11 PanosProv

I'm new to methylseq analysis. One of the scientists in our lab need to run the analysis against the positive and negative control sequences (pUC19 and Lambda). I think this is the feature you're trying to add. What's the current status and how can I contribute?

trum994 avatar Feb 01 '25 00:02 trum994

Hello @trum994, I have worked on it before Christmas and have had some progress. It needs some more work though and I will resume it this week (busy with other things in the meantime). I am also just learning NextFlow , so I am not the fastest at implementing it.

PanosProv avatar Feb 04 '25 16:02 PanosProv

Hi, just wondering if there is any available tools for analyzing pUC19 and lambda DNA?

panyuwen avatar May 05 '25 21:05 panyuwen

What I did was added the sequences to the reference genome and then used the --fasta option for input. You can find the sequences here: https://neb-em-seq-sra.s3.amazonaws.com/grch38_core%2Bbs_controls.fa

mpiersonsmela avatar Aug 03 '25 16:08 mpiersonsmela

In the cases where there are controls added, we've performed what @mpiersonsmela but did a post-processing round to extract reads aligned to the control sequences, then evaluated methylation conversion on each control using Bismark's methylation_consistency. We also perform this during standard analyses using methylKit.

cjfields avatar Aug 03 '25 21:08 cjfields