2020plus icon indicating copy to clipboard operation
2020plus copied to clipboard

Require snvboxGenes.fa file

Open vivekruhela opened this issue 1 year ago • 1 comments

Hello,

I am using the 2020+ tool to identify the potential candidate driver gene. I can download the required files, such as snvboxGenes.bed or scores.tar.gz, but I am not able to get the exact file required for gene.fa in the following command:

mut_annotate --summary -i genes.fa -b genes.bed -s score_dir -m mutations.txt -o summary.txt

I tried various fasta files generated from UCSC Table Brower, but now of them worked. Can you share the exact fasta file you used in your published work? Thanks.

vivekruhela avatar Jul 14 '23 12:07 vivekruhela

Hi. The snvboxGenes.fa file (i.e. input of -i in mut_annotate) is generated from the extract_gene_seq command (see https://probabilistic2020.readthedocs.io/en/latest/tutorial.html#gene-fasta). One just needs to download the hg19 fasta file from UCSC (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit), convert the file from 2bit to fasta format using the twoBitToFa command line tool from UCSC, and then run extract_gene_seq command with hg19.fa and snvboxGenes.bed as input.

I've also attached the snvboxGenes.fa file below as well. snvboxGenes.fa.gz

ctokheim avatar Jan 07 '24 22:01 ctokheim