AmpliconArchitect
AmpliconArchitect copied to clipboard
detect ecDNAs from circle-seq data
Hi, I am trying to detect ecDNAs from my circle-seq data,is AA suitable for that? and I am going to use prepareAA and CNVkit to get the CNV bed files which will be feeded to AA. If not, could you give me some suggestions?
Sent with GitHawk
Hi,
We have used AA successfully on Circle-Seq data in the past (Kim et. al, 2020 Nature Genetics). I do not forsee any issues arising provided you follow the best practices for AA (which it sounds like you are already doing). Please feel free to reach out with questions or issues if you run into any.
Best regards, Jens
Thank you so much, Jens.
Yes I am trying to detect ecDNA from circle-seq data right now. And I am confused about my output from cnvkit:
There is no CNV meet the condition of "--gain 4.5 --cnsize_min 50000", namely, the total line of file "HeLa-45_L1_702D02.sort.dedup_CNV_GAIN.bed" is zero. And another sample just got 10 lines for this corresponding files.
My command lines is : /data4_v35/zwen/software/PrepareAA/PrepareAA.py -s hela -t 6 --cnvkit_dir /data4_v35/zwen/anaconda3/envs/cnvkit/bin/cnvkit.py --sorted_bam /data4_v35/zwen/zzx/ecDNA/HeLa-45_L1_702D02.sort.dedup.bam -o /data4_v35/zwen/zzx/ecDNA/AA/prepareAA/cnvkit_output --run_AA --ref GRCh38
could you give me some suggestions ?
Certainly happy to help. In this case, you can do the following (it looks like you already did these things using PrepareAA, but am providing info again regardless):
- Run the
convert_cns_to_bed.py
script from PrepareAA to create a bed file from the .cns output. Note that you will run the script on the fileHeLA-45_L1_702D02.sort.dedup.cns
, not the .call.cns or .bintest.cns files. - Run
amplified_intervals.py
on the bed file and bam file. Recommend--gain 4.5
and--cnsize_min 50000
.
These steps are equivalent to what is done using PrepareAA, which is what it appears you have run.
The CNV_GAIN.bed from PrepareAA conforms to filtering parameters specified by the user. We analyzed naive HeLa cells in Turner et al. Nature 2017, and did not find evidence of ecDNA. In cases where no ecDNA candidate regions are detected, the seed files will often be empty.
I hope this is helpful! Best, Jens
I have run: convert_cns_to_bed.py HeLa-45_L1_702D02.sort.dedup.cns Then the 5th column(copy number) of the output file HeLa-45_L1_702D02.sort.dedup.bed does not have any that > 4.5, so the seed interval will be zero. Thank you for explaining it for me and I understand that.
And thank you so much for telling me that you didn`t find any evidence of ecDNA in naive HeLa cells, that really helped. But I still analyze a set of HeLa WGS data (From GEO) with AA to, just make sure that I could run AA successfully. However, I have some difficulties in understanding the output results:
I get 15 amplicons, but how could I judge whether ecDNA exists from these output files?
Hi,
To make bioinformatic predictions of ecDNA status from AA outputs, you can try our AmpliconClassifier tool.
Best, Jens
Hi Jens: I'm also trying to detect ecDNA with Circle-seq. I successfully reconstructed the ecDNA with Circle-seq data coupled with corresponding WGS data for coverage calculation with --cbam parameter. But when it comes to Circle-seq data alone, there seems to be something wrong. My code is simple: python /share/home/chenjx/tools/AmpliconArchitect/src/AmpliconArchitect.py --bam batch4_C40e.dedup_above30.sorted.bam --bed batch4_C40e_circle_AA.bed --out batch4_C40e_circle_AA --ref "GRCh38"
error is as follow:
[root:INFO] Commandline: /share/home/chenjx/tools/AmpliconArchitect/src/AmpliconArchitect.py --bam batch4_C40e.dedup_above30.sorted.bam --bed batch4_C40e_circle_AA.bed --out batch4_C40e_circle_AA --ref GRCh38
[root:INFO] AmpliconArchitect version 1.2
[root:INFO] #TIME 4.860 Loading libraries and reference annotations for: GRCh38
Global ref name is GRCh38
[root:WARNING] #TIME 4.920 interval_list: Unable to open interval file "/share/home/chenjx/tools/AmpliconArchitect/data_repo/GRCh38/Genes_hg38.gff".
[root:INFO] #TIME 7.780 Initiating bam_to_breakpoint object for: batch4_C40e.dedup_above30.sorted.bam
0 1001 440 0.0
0 1001 192 0.0
read length 149.34706206177202 461.5485952576629 241.23039388372885 1185.2397769088493 0.951874601611
coverage stats (0, 0, 0, 0, 0, 0, 149.34706206177202, 461.5485952576629, 241.23039388372885, 0, 1185.2397769088493, 2, 0.9518746016109405) 0
pair support 2
[root:INFO] #TIME 45.960 Exploring interval: chr8 126222953 128411212
[root:INFO] #TIME 71.150 Searching new neighbors for interval: chr8 113042953 143941212
[root:INFO] #TIME 71.530 Calculating coverage meanshift segmentation
[root:INFO] #TIME 174.910 Detecting breakpoint edges
Traceback (most recent call last):
File "/share/home/chenjx/tools/AmpliconArchitect/src/AmpliconArchitect.py", line 214, in
And I attached the log file.
I would appreciate any advice!
Hi, unfortunately without the matched downsampled WGS, the AA algorithm will not always work properly since there must be some level of background genomic content against which to compute coverage. I suppose there are strategies for adding that kind of data to your sample post-hoc. However, if you are reconstructing from Circle-Seq data alone you may also consider contacting the authors of the Circle-Seq paper and they may be able to share their reconstruction method with you.
Thanks, Jens