ngseasy icon indicating copy to clipboard operation
ngseasy copied to clipboard

Re Dev : To Do

Open snewhouse opened this issue 8 years ago • 13 comments

New Branch in GIT repn

  • [x] make a new branch

f1000_dev on image /home/ubuntu/scratch/ngseasy

Openstack VM

  • [x] space
  • [x] send key to amos
  • [x] 30+ CPU
  • [x] max RAM
  • [x] Volume : 4TB

Images

  • [ ] build images
  • [ ] build tool set
  • [x] build one image with all tools

Get Genomes

  • [x] hg19.fasta
  • [x] hs37d5.fasta
  • [ ] GRCh38.p7.fasta
  • [ ] hs38DH.fasta
  • [ ] gatk resources bundles
17.05.2016
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz

Get test data

  • [ ] small 30-150x data set

Index Genomes

  • [ ] bwa
    • [x] hg19.fasta
    • [x] hs37d5.fasta
    • [ ] hs38DH.fasta
  • [ ] snap
    • [ ] hg19.fasta
    • [ ] hs37d5.fasta
    • [ ] hs38DH.fasta
  • [ ] novoalign
    • [ ] hg19.fasta
    • [ ] hs37d5.fasta
    • [ ] hs38DH.fasta
  • [ ] bowtie2
    • [ ] hg19.fasta
    • [ ] hs37d5.fasta
    • [ ] hs38DH.fasta

bwa

├── hs37d5.fasta
├── hs37d5.fasta.amb
├── hs37d5.fasta.ann
├── hs37d5.fasta.bwt
├── hs37d5.fasta.pac
├── hs37d5.fasta.sa

PLAN BY MONDAY 23rd

giab_data_indexes

https://github.com/genome-in-a-bottle/giab_data_indexes

Test Data

  • [ ] 30x Exome
  • [ ] 150x Exome
  • [ ] 1x WGX at 30x min. (source better WGS data set as X10 is shit and messy)

GATK Gold Standard Run

  • [ ] run bwa-realing-bsqr-haplotypecaller on all 3 data sets

This is the "Gold Standard". This will a week if no bugs.

The Glue

Open :-

  1. BASH done better than before
  • logging
  • read a user supplied config file (spreadsheet like)
  • user specifies the pipeline
  • SJN TO ADD CONFIG PARAMETER LIST
  • consider converting to .yaml behind the scenes
  • self checks : does input exist move on

RECON BY MONDAY NEXT WEEK

snewhouse avatar May 08 '16 12:05 snewhouse

hg19
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/

snewhouse avatar May 17 '16 12:05 snewhouse

https://github.com/lh3/bwa/blob/master/bwakit/run-gen-ref

snewhouse avatar May 17 '16 12:05 snewhouse

cloned into /home/ubuntu/scratch/ngseasy

snewhouse avatar May 17 '16 14:05 snewhouse

http://bcb.io/2015/09/17/hg38-validation/

snewhouse avatar May 17 '16 14:05 snewhouse

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/README.txt

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz

snewhouse avatar May 17 '16 14:05 snewhouse

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

the readme

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/README.20150309.GRCh38_full_analysis_set_plus_decoy_hla

snewhouse avatar May 17 '16 14:05 snewhouse

http://biodata.s3-website-us-east-1.amazonaws.com/hg38_bundle/

snewhouse avatar May 17 '16 14:05 snewhouse

https://github.com/iiiir/GRCH38_gatk_bundle

snewhouse avatar May 17 '16 14:05 snewhouse

https://github.com/BD2KGenomics/gatk-whole-genome-pipeline/blob/master/GATKsetup.sh

snewhouse avatar May 17 '16 15:05 snewhouse

https://github.com/BD2KGenomics

snewhouse avatar May 17 '16 15:05 snewhouse

SJN dev in /mnt/data1/scratch/ngseasy

snewhouse avatar May 18 '16 11:05 snewhouse

git lfs : https://git-lfs.github.com/

snewhouse avatar May 18 '16 11:05 snewhouse

cp -v GRCh38_full_analysis_set_plus_decoy_hla.fa GRCh38dH.fasta
'GRCh38_full_analysis_set_plus_decoy_hla.fa' -> 'GRCh38dH.fasta'

snewhouse avatar May 18 '16 13:05 snewhouse