ngseasy
ngseasy copied to clipboard
Re Dev : To Do
New Branch in GIT repn
- [x] make a new branch
f1000_dev
on image
/home/ubuntu/scratch/ngseasy
Openstack VM
- [x] space
- [x] send key to amos
- [x] 30+ CPU
- [x] max RAM
- [x] Volume : 4TB
Images
- [ ] build images
- [ ] build tool set
- [x] build one image with all tools
Get Genomes
- [x] hg19.fasta
- [x] hs37d5.fasta
- [ ] GRCh38.p7.fasta
- [ ] hs38DH.fasta
- [ ] gatk resources bundles
17.05.2016
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz
Get test data
- [ ] small 30-150x data set
Index Genomes
- [ ] bwa
- [x] hg19.fasta
- [x] hs37d5.fasta
- [ ] hs38DH.fasta
- [ ] snap
- [ ] hg19.fasta
- [ ] hs37d5.fasta
- [ ] hs38DH.fasta
- [ ] novoalign
- [ ] hg19.fasta
- [ ] hs37d5.fasta
- [ ] hs38DH.fasta
- [ ] bowtie2
- [ ] hg19.fasta
- [ ] hs37d5.fasta
- [ ] hs38DH.fasta
bwa
├── hs37d5.fasta
├── hs37d5.fasta.amb
├── hs37d5.fasta.ann
├── hs37d5.fasta.bwt
├── hs37d5.fasta.pac
├── hs37d5.fasta.sa
PLAN BY MONDAY 23rd
giab_data_indexes
https://github.com/genome-in-a-bottle/giab_data_indexes
Test Data
- [ ] 30x Exome
- [ ] 150x Exome
- [ ] 1x WGX at 30x min. (source better WGS data set as X10 is shit and messy)
GATK Gold Standard Run
- [ ] run bwa-realing-bsqr-haplotypecaller on all 3 data sets
This is the "Gold Standard". This will a week if no bugs.
The Glue
Open :-
- BASH done better than before
- logging
- read a user supplied config file (spreadsheet like)
- user specifies the pipeline
- SJN TO ADD CONFIG PARAMETER LIST
- consider converting to .yaml behind the scenes
- self checks : does input exist move on
RECON BY MONDAY NEXT WEEK
hg19
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
https://github.com/lh3/bwa/blob/master/bwakit/run-gen-ref
cloned into /home/ubuntu/scratch/ngseasy
http://bcb.io/2015/09/17/hg38-validation/
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/README.txt
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
the readme
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/README.20150309.GRCh38_full_analysis_set_plus_decoy_hla
http://biodata.s3-website-us-east-1.amazonaws.com/hg38_bundle/
https://github.com/iiiir/GRCH38_gatk_bundle
https://github.com/BD2KGenomics/gatk-whole-genome-pipeline/blob/master/GATKsetup.sh
https://github.com/BD2KGenomics
SJN dev in /mnt/data1/scratch/ngseasy
git lfs : https://git-lfs.github.com/
cp -v GRCh38_full_analysis_set_plus_decoy_hla.fa GRCh38dH.fasta
'GRCh38_full_analysis_set_plus_decoy_hla.fa' -> 'GRCh38dH.fasta'