AmpliconArchitect icon indicating copy to clipboard operation
AmpliconArchitect copied to clipboard

DataRepo

Open al3xMlt opened this issue 2 years ago • 3 comments

Hello, thanks for AA! It works class! I was wondering if I can use my own fasta file (used to generate the bam)? Maybe you can give a brief explanation about the data_repo folder structure. Should I modify the dummy_ploidy file if I have an aneuploid sample?

Best

Alex

al3xMlt avatar Mar 23 '22 10:03 al3xMlt

Hi Alex,

You can use your own fasta file to generate the BAM, without any modifications to the AA data_repo, provided it is some version of hg19, GRCh37, GRCh38/hg38, or mm10 (mouse). If it is something else - a genome build not listed or a different species, then the process for generating the data repo becomes much more complicated as there are many 3rd-party annotation files that must be collected. We have found that those annotations are not always available for certain species/builds. Note that if your fasta uses accession numbers instead of chromosome names (e.g. CM000663.2 instead of chr1), that will cause a problem.

The dummy_ploidy file is only used if you are using PrepareAA with the Canvas CNV caller option to generate seed regions for AA. If you are starting with a BAM file, I recommend PrepareAA with the CNVKit caller option to identify seeds regions.

I'll note, since it comes up occasionally, if you align to a fasta with viral sequences and want AA to explore them, please add the viral sequence(s) of interest to the AA seeds file (*_AA_CNV_SEEDS.bed if using PrepareAA) before running AA.

Thanks and please let me know if any additional issues or questions arise! Jens

jluebeck avatar Mar 23 '22 16:03 jluebeck

Hi Alex,

You can use your own fasta file to generate the BAM, without any modifications to the AA data_repo, provided it is some version of hg19, GRCh37, GRCh38/hg38, or mm10 (mouse). If it is something else - a genome build not listed or a different species, then the process for generating the data repo becomes much more complicated as there are many 3rd-party annotation files that must be collected. We have found that those annotations are not always available for certain species/builds. Note that if your fasta uses accession numbers instead of chromosome names (e.g. CM000663.2 instead of chr1), that will cause a problem.

The dummy_ploidy file is only used if you are using PrepareAA with the Canvas CNV caller option to generate seed regions for AA. If you are starting with a BAM file, I recommend PrepareAA with the CNVKit caller option to identify seeds regions.

I'll note, since it comes up occasionally, if you align to a fasta with viral sequences and want AA to explore them, please add the viral sequence(s) of interest to the AA seeds file (*_AA_CNV_SEEDS.bed if using PrepareAA) before running AA.

Thanks and please let me know if any additional issues or questions arise! Jens

Hi, jluebeck, you mean if I want to run AA in viral mode, I should first align my fastq file to a modified genome with human and viral genome, and then add the viral genome interval to *_AA_CNV_SEEDS.bed? I even do not need to change the genome in data_repo before running AA?

panxiaoguang avatar Apr 26 '22 01:04 panxiaoguang

Hi Alex, that's correct!

jluebeck avatar Apr 26 '22 16:04 jluebeck