AmpliconSuite-pipeline icon indicating copy to clipboard operation
AmpliconSuite-pipeline copied to clipboard

run Ampliconsuite-pipeline on WGBS data

Open QLZhouBio opened this issue 5 months ago • 8 comments

Thanks for developing this tool for ceDNA analysis. I have some bam files from generated from WGBS data and am wondering if AmpliconSuite is able to use these bam files as input for ceDNA detecion? If not, is there any other tools could be used? Thanks.

QLZhouBio avatar Jan 22 '24 06:01 QLZhouBio

Hi, thanks for this question. AmpliconSuite-pipeline is not designed for analysis with whole genome bisulfite sequencing, only paired-end whole genome sequencing. I am not currently aware of any existing tools that take WGBS as input and provide ecDNA predictions.

Jens

jluebeck avatar Jan 22 '24 06:01 jluebeck

Hi, thanks for this question. AmpliconSuite-pipeline is not designed for analysis with whole genome bisulfite sequencing, only paired-end whole genome sequencing. I am not currently aware of any existing tools that take WGBS as input and provide ecDNA predictions.

Jens

Thanks for the prompt reply. Is there any particular reason that Ampliconsuite is not able to process WGBS data? I think it will be quite interesting to ultilize WGBS data for ecDNA detection.

QLZhouBio avatar Jan 22 '24 06:01 QLZhouBio

I am not an expert in WGBS, however the conversion of unmethylated cytosine to uracil, and subsequently to thymine will likely reduce the quality of alignments, creating challenges in both SV detection and CN calling. Another concern is that the WGBS protocol would cause uneven coverage due to possible bias in fragment selection.

If the reads are PE, you are welcome to try and see what happens. If you had any way to convert the altered thymine bases in the reads back to the reference allele (pseudo-wgs) then you might be able to do something with the data.

Jens

jluebeck avatar Jan 22 '24 06:01 jluebeck

You can get fairly good quality focal amplification calls from cheap low-pass WGS (1x coverage) if generating additional data is an option available to you.

jluebeck avatar Jan 22 '24 06:01 jluebeck

I am not an expert in WGBS, however the conversion of unmethylated cytosine to uracil, and subsequently to thymine will likely reduce the quality of alignments, creating challenges in both SV detection and CN calling. Another concern is that the WGBS protocol would cause uneven coverage due to possible bias in fragment selection.

If the reads are PE, you are welcome to try and see what happens. If you had any way to convert the altered thymine bases in the reads back to the reference allele (pseudo-wgs) then you might be able to do something with the data.

Jens

Hi Jens, thanks for the clarificaiton. I would like to try to use the bam files from the bismark pipeline as input to run the AmpliconSuite and see how it goes. At the meantime, might I double check with you that whether the hg38 reference genomce fa file is used for seed intervals selection section or any other downstream step after bwa mapping step? If so, probably it might be a potential issue since, as you indicated, after conversion a lot of C become T in the bam files.

BTW, since we have genreated a database with a few thousend WGBS samples, I think it would be very difficult to re-do WGS with a shadow sequencing again:)

QLZhouBio avatar Jan 22 '24 08:01 QLZhouBio

Hi, the pipeline is primarily going to use mapping quality scores from the bam. These are based on how well the read aligned to the reference. This is used in both SV and CN detection. So in a sense the reference is used in all stages of the pipeline.

I am guessing you may easily find some CN seeds but have a difficult time recovering SVs. No clue how even coverage is for WGBS is and if it is not even this will be a big problem.

If you can find a tool that takes a WGBS bam and converts the TG basepairs back to CG where appropriate then you may have more luck. Such a tool may not exist.

Jens

jluebeck avatar Jan 22 '24 17:01 jluebeck

Hi, the pipeline is primarily going to use mapping quality scores from the bam. These are based on how well the read aligned to the reference. This is used in both SV and CN detection. So in a sense the reference is used in all stages of the pipeline.

I am guessing you may easily find some CN seeds but have a difficult time recovering SVs. No clue how even coverage is for WGBS is and if it is not even this will be a big problem.

If you can find a tool that takes a WGBS bam and converts the TG basepairs back to CG where appropriate then you may have more luck. Such a tool may not exist.

Jens

Hi Jen, thanks for your kindly clarificaiton. Actually I had a few wgs and wgbs data from same samples and I ran Ampliconsuite with these data. Seems like it could find similar amplicons in both wgs and wgbs data with AA pipeline, however, in AC output it is not the same. Is there any email address I could share these AA and AC output with you so that you could have a closer look on it? thanks.

QLZhouBio avatar Feb 15 '24 10:02 QLZhouBio

Sure - please feel free to share it to jluebeck [at] ucsd.edu

jluebeck avatar Feb 15 '24 17:02 jluebeck