AmpliconArchitect
AmpliconArchitect copied to clipboard
DataRepo
Hello, thanks for AA! It works class! I was wondering if I can use my own fasta file (used to generate the bam)? Maybe you can give a brief explanation about the data_repo folder structure. Should I modify the dummy_ploidy file if I have an aneuploid sample?
Best
Alex
Hi Alex,
You can use your own fasta file to generate the BAM, without any modifications to the AA data_repo, provided it is some version of hg19, GRCh37, GRCh38/hg38, or mm10 (mouse). If it is something else - a genome build not listed or a different species, then the process for generating the data repo becomes much more complicated as there are many 3rd-party annotation files that must be collected. We have found that those annotations are not always available for certain species/builds. Note that if your fasta uses accession numbers instead of chromosome names (e.g. CM000663.2 instead of chr1), that will cause a problem.
The dummy_ploidy file is only used if you are using PrepareAA with the Canvas CNV caller option to generate seed regions for AA. If you are starting with a BAM file, I recommend PrepareAA with the CNVKit caller option to identify seeds regions.
I'll note, since it comes up occasionally, if you align to a fasta with viral sequences and want AA to explore them, please add the viral sequence(s) of interest to the AA seeds file (*_AA_CNV_SEEDS.bed if using PrepareAA) before running AA.
Thanks and please let me know if any additional issues or questions arise! Jens
Hi Alex,
You can use your own fasta file to generate the BAM, without any modifications to the AA data_repo, provided it is some version of hg19, GRCh37, GRCh38/hg38, or mm10 (mouse). If it is something else - a genome build not listed or a different species, then the process for generating the data repo becomes much more complicated as there are many 3rd-party annotation files that must be collected. We have found that those annotations are not always available for certain species/builds. Note that if your fasta uses accession numbers instead of chromosome names (e.g. CM000663.2 instead of chr1), that will cause a problem.
The dummy_ploidy file is only used if you are using PrepareAA with the Canvas CNV caller option to generate seed regions for AA. If you are starting with a BAM file, I recommend PrepareAA with the CNVKit caller option to identify seeds regions.
I'll note, since it comes up occasionally, if you align to a fasta with viral sequences and want AA to explore them, please add the viral sequence(s) of interest to the AA seeds file (*_AA_CNV_SEEDS.bed if using PrepareAA) before running AA.
Thanks and please let me know if any additional issues or questions arise! Jens
Hi, jluebeck, you mean if I want to run AA in viral mode, I should first align my fastq file to a modified genome with human and viral genome, and then add the viral genome interval to *_AA_CNV_SEEDS.bed? I even do not need to change the genome in data_repo before running AA?
Hi Alex, that's correct!