Scratch pad

hs38 primary assembly of GRCh38 (incl. chromosomes, unplaced and unlocalized contigs) and EBV hs38a hs38 plus ALT contigs hs38DH hs38a plus decoy contigs and HLA genes (recommended for GRCh38 mapping) hs37 primary assembly of GRCh37 (used by 1000g phase 1) plus the EBV genome hs37d5 hs37 plus decoy contigs (used by 1000g phase 3)

For 1.0 to 1.X

ploidy options - freebayes SEX chrom calling XX XY Y lcr regions options config file reading in fixes config file order of options platypus options testing and tweaks test freebayes options - regions are mapped reads callable loci or just Chromsomes Improve logging clean up install biobambam

Future Dev

b38 indexes - GIAB b38 pipelines - GIAB gui - user install updates git AWS gce pricing registration for GATK and others capture bed files from companies recalling pipeline cohort pipeline cancer pipeline CNV pipeline Annotation pipeline PGRS pipeline plus reporting chanjo bcbio options : give user option for bcbio or speedseq test speedseq need parser to create options for calling bcbio or speedseq

Browsers

http://genomesavant.com/p/savant/index

Apr 25 '15 11:04 snewhouse

ADD https://github.com/GregoryFaust/yaha

May 03 '15 10:05 snewhouse

vt decompose -s $VCF | vt normalize -r $REFERENCE - > $NEW_VCF

change var calling to use above ; sub vcfallelicprimitives for vt decompose

see http://gemini.readthedocs.org/en/latest/content/preprocessing.html

May 05 '15 17:05 snewhouse

# decompose, normalize and annotate VCF with snpEff.
# NOTE: can also swap snpEff with VEP
#NOTE: -classic and -formatEff flags needed with snpEff >= v4.1
zless $VCF \
   | sed 's/ID=AD,Number=./ID=AD,Number=R/'
   | vt decompose -s - \
   | vt normalize -r $REF - \
   | java -Xmx4G -jar $SNPEFFJAR -formatEff -classic GRCh37.75  \
   | bgzip -c > $NORMVCF
tabix $NORMVCF

May 05 '15 17:05 snewhouse

Add decoy genomes

EBV Human pathogens

May 20 '15 13:05 snewhouse

Add SURPI

May 20 '15 13:05 snewhouse

GATK 3

add options and scripts for HC gVCF stuff (1.5 or 2.0 release)

May 22 '15 10:05 snewhouse

lobSTR

Abundant contribution of short tandem repeats to gene expression variation in humans

Melissa Gymrek, Thomas Willems, Haoyang Zeng, Barak Markus, Mark J Daly, Alkes L Price, Jonathan Pritchard, Yaniv Erlich

doi: http://dx.doi.org/10.1101/017459

Abstract

Expression quantitative trait loci (eQTLs) are a key tool to dissect cellular processes mediating complex diseases. However, little is known about the role of repetitive elements as eQTLs. We report a genome-wide survey of the contribution of Short Tandem Repeats (STRs), one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from linked SNPs and indels and found that eSTRs contribute 10%-15% of the cis-heritability mediated by all common variants. Functional genomic analyses showed that eSTRs are enriched in conserved regions, co-localize with regulatory elements, and are predicted to modulate histone modifications. Our results show that eSTRs provide a novel set of regulatory variants and highlight the contribution of repeats to the genetic architecture of quantitative human traits.

May 22 '15 16:05 snewhouse

mention I doc that ngseasy is run as NON-ROOT within the container.
Docker 1.7 should have namespace

May 29 '15 08:05 snewhouse

userns has been dropped from Docker 1.7. Trying to find out when it is penned in (hopefully v1.8)

Jun 11 '15 11:06 afolarin

better logging
cleanups

Jul 01 '15 11:07 snewhouse

mr and mrs Fast

mrCaNaVaR

Alkan et al, “Personalized copy number and segmental duplication
maps using next-generation sequencing.”,
● Also used in:

○ Sudmant et al, Science, 2010; 1000 Genomes Project and SV companions 

maps using next-generation sequencing.”, Nature Genetics, 2009

(2010,2011,2012,2015), Great Ape Diversity Project (Prado-Martinez et al, Nature, 2013), 

several genome projects (gorilla, bonobo), Neandertal and Denisova projects, dog 

domestication (Freedman et al, 2014), cat domestication (Montague et al, 2014 and 

Tamazian et al 2014), maybe others...

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.

Jul 15 '15 05:07 snewhouse

http://mrcanavar.sourceforge.net/

mrCaNaVaR (mɪstər ʤʌnʌvʌr) is a copy number caller that analyzes the whole-genome next-generation sequence mapping read depth to discover large segmental duplications and deletions. It also has the capability of predicting absolute copy numbers of genomic intervals.

Jul 15 '15 05:07 snewhouse

vt peek v0.5

description : Summarizes the variants in a VCF file

usage : vt peek [options] <in.vcf>

options : -y  output pdf file [summary.pdf]
          -x  output latex directory []
          -f  filter expression []
          -I  file containing list of intervals []
          -i  intervals []
          -r  reference sequence fasta file []
          -?  displays help

Jul 17 '15 13:07 snewhouse

https://github.com/hall-lab/speedseq/blob/master/example/example_speedseq_install.sh

Jul 18 '15 06:07 snewhouse

support non-freebayes/GATK callers by preprocessing

https://github.com/arq5x/gemini/issues/409 https://gist.github.com/brentp/4db670df147cbd5a2b32

brentp / preprocess.py

https://gist.github.com/brentp/4db670df147cbd5a2b32#file-preprocess-py

Jul 19 '15 07:07 snewhouse

Platypus uses its own VCF nomenclature: TC == DP, FR == AF

##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=2015-07-18
##source=Platypus_Version_0.8.1
##platypusOptions={'assemblyRegionSize': 1500, 'trimReadFlank': 0, 'assembleBadReads': 1, 'bamFiles': ['/home/pipeman/ngs_projects/GCAT_Data/NA12878/alignments/NA12878.WEX.100bp30x.PE.ILLUMINA.btrim.snap.hg19.filtered.bam'], 'minVarDist': 9, 'trimSoftClipped': 1, 'minReads': 2, 'qualBinSize': 1, 'refFile': '/home/pipeman/ngs_projects/ngseasy_
##filter="QUAL > 5"
##INFO=<ID=FR,Number=.,Type=Float,Description="Estimated population frequency of variant">
##INFO=<ID=MMLQ,Number=1,Type=Float,Description="Median minimum base quality for bases around variant">
##INFO=<ID=TCR,Number=1,Type=Integer,Description="Total reverse strand coverage at this locus">
##INFO=<ID=HP,Number=1,Type=Integer,Description="Homopolymer run length around variant locus">
##INFO=<ID=WE,Number=1,Type=Integer,Description="End position of calling window">
##INFO=<ID=Source,Number=.,Type=String,Description="Was this variant suggested by Playtypus, Assembler, or from a VCF?">
##INFO=<ID=FS,Number=.,Type=Float,Description="Fisher's exact test for strand bias (Phred scale)">
##INFO=<ID=WS,Number=1,Type=Integer,Description="Starting position of calling window">
##INFO=<ID=PP,Number=.,Type=Float,Description="Posterior probability (phred scaled) that this variant segregates">
##INFO=<ID=TR,Number=.,Type=Integer,Description="Total number of reads containing this variant">
##INFO=<ID=NF,Number=.,Type=Integer,Description="Total number of forward reads containing this variant">
##INFO=<ID=TCF,Number=1,Type=Integer,Description="Total forward strand coverage at this locus">
##INFO=<ID=NR,Number=.,Type=Integer,Description="Total number of reverse reads containing this variant">
##INFO=<ID=TC,Number=1,Type=Integer,Description="Total coverage at this locus">
##INFO=<ID=END,Number=.,Type=Integer,Description="End position of reference call block">
##INFO=<ID=MGOF,Number=.,Type=Integer,Description="Worst goodness-of-fit value reported across all samples">
##INFO=<ID=SbPval,Number=.,Type=Float,Description="Binomial P-value for strand bias test">
##INFO=<ID=START,Number=.,Type=Integer,Description="Start position of reference call block">
##INFO=<ID=ReadPosRankSum,Number=.,Type=Float,Description="Mann-Whitney Rank sum test for difference between in positions of variants in reads from ref and alt">
##INFO=<ID=MQ,Number=.,Type=Float,Description="Root mean square of mapping qualities of reads at the variant position">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant-quality/read-depth for this variant">
##INFO=<ID=SC,Number=1,Type=String,Description="Genomic sequence 10 bases either side of variant position">
##INFO=<ID=BRF,Number=1,Type=Float,Description="Fraction of reads around this variant that failed filters">
##INFO=<ID=HapScore,Number=.,Type=Integer,Description="Haplotype score measuring the number of haplotypes the variant is segregating into in a window">
##INFO=<ID=Size,Number=.,Type=Integer,Description="Size of reference call block">
##FILTER=<ID=GOF,Description="Variant fails goodness-of-fit test.">
##FILTER=<ID=badReads,Description="Variant supported only by reads with low quality bases close to variant position, and not present on both strands.">
##FILTER=<ID=alleleBias,Description="Variant frequency is lower than expected for het">
##FILTER=<ID=hp10,Description="Flanking sequence contains homopolymer of length 10 or greater">
##FILTER=<ID=Q20,Description="Variant quality is below 20.">
##FILTER=<ID=HapScore,Description="Too many haplotypes are supported by the data in this region.">
##FILTER=<ID=MQ,Description="Root-mean-square mapping quality across calling region is low.">
##FILTER=<ID=strandBias,Description="Variant fails strand-bias filter">
##FILTER=<ID=SC,Description="Variants fail sequence-context filter. Surrounding sequence is low-complexity">
##FILTER=<ID=QualDepth,Description="Variant quality/Read depth ratio is low.">
##FILTER=<ID=REFCALL,Description="This line represents a homozygous reference call">
##FILTER=<ID=QD,Description="Variants fail quality/depth filter.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Unphased genotypes">
##FORMAT=<ID=GQ,Number=.,Type=Integer,Description="Genotype quality as phred score">
##FORMAT=<ID=GOF,Number=.,Type=Float,Description="Goodness of fit value">
##FORMAT=<ID=NR,Number=.,Type=Integer,Description="Number of reads covering variant location in this sample">
##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype log10-likelihoods for AA,AB and BB genotypes, where A = ref and B = variant. Only applicable for bi-allelic sites">
##FORMAT=<ID=NV,Number=.,Type=Integer,Description="Number of reads containing variant in this sample">
##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=OLD_VARIANT,Number=1,Type=String,Description="Original chr:pos:ref:alt encoding">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12878.WEX.100bp30x.PE.ILLUMINA.btrim.snap.hg19
chr1    14653   .       C       T       96      PASS    AC=1;AF=0.5;AN=2;BRF=0;FR=0.5;HP=2;HapScore=2;MGOF=2;MMLQ=40;MQ=34.13;NF=4;NR=0;NS=1;PP=96;QD=32.6928;SC=GTCAGAGCAACGGCCCAAGTC;SbPval=1;Source=Platypus,Assembler;TC=11;TCF=9;TCR=2;TR=4;WE=14661;WS=14643      GT:GL:GOF:GQ:NR:NV      0/1:-13.69,0,-15.39:2:99:11:4
chr1    14907   .       A       G       1798    PASS    AC=1;AF=0.5;AN=2;BRF=0;FR=0.5;HP=1;HapScore=1;MGOF=26;MMLQ=35;MQ=39.8;NF=34;NR=23;NS=1;PP=1798;QD=32.1539;SC=AAATACAGGAAGAAAAAGGCA;SbPval=0.65;Source=Platypus,Assembler;TC=63;TCF=36;TCR=27;TR=57;WE=14915;WS=14897    GT:GL:GOF:GQ:NR:NV      0/1:-183.84,0,-4.54:26:45:63:57
chr1    14930   .       A       G       2288    PASS    AC=2;AF=1;AN=2;BRF=0;FR=1;HP=1;HapScore=1;MGOF=24;MMLQ=31;MQ=39.56;NF=43;NR=24;NS=1;PP=2288;QD=34.6682;SC=ACAGAATTACAAGGTGCTGGC;SbPval=0.52;Source=Platypus,Assembler;TC=70;TCF=45;TCR=25;TR=67;WE=14938;WS=14920       GT:GL:GOF:GQ:NR:NV      1/1:-232.3,-10.97,0:24:99:70:67
chr1    15118   .       A       G       288     PASS    AC=1;AF=0.5;AN=2;BRF=0;FR=0.5006;HP=2;HapScore=2;MGOF=0;MMLQ=33;MQ=28.68;NF=1;NR=9;NS=1;PP=288;QD=32.2771;SC=CCCCCATGACACTCCCCAGCC;SbPval=0.58;Source=Platypus,Assembler;TC=12;TCF=1;TCR=11;TR=10;WE=15126;WS=15108     GT:GL:GOF:GQ:NR:NV      0/1:-32.89,0,-2.89:0:29:12:10
chr1    16495   .       G       C       79      PASS    AC=1;AF=0.5;AN=2;BRF=0;FR=0.5;HP=1;HapScore=2;MGOF=20;MMLQ=35;MQ=25.13;NF=4;NR=0;NS=1;PP=79;QD=28.4428;SC=TATTTGAAATGGAAACTATTC;SbPval=1;Source=Platypus,Assembler;TC=11;TCF=10;TCR=1;TR=4;WE=16503;WS=16485    GT:GL:GOF:GQ:NR:NV      1/0:-11.99,0,-22.69:20:99:11:4

Jul 19 '15 07:07 snewhouse

Freebayes vcf header

##fileformat=VCFv4.1
##fileDate=20150719
##source=freeBayes v0.9.21-19-gc003c1e
##reference=/home/pipeman/ngs_projects/ngseasy_resources/reference_genomes_hg19/ucsc.hg19.fasta
##phasing=none
##commandline="freebayes -f /home/pipeman/ngs_projects/ngseasy_resources/reference_genomes_hg19/ucsc.hg19.fasta -b /home/pipeman/ngs_projects/GCAT_Data/NA12878/alignments/NA12878.WEX.100bp30x.PE.ILLUMINA.no-trim.bwa.hg19.filtered.bam --min-coverage 2 --min-mapping-quality 20 --min-base-quality 20 --min-repeat-entropy 1 --genotype-qualities --
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus">
##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count, with partial observations recorded fractionally">
##INFO=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observations, with partial observations recorded fractionally">
##INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele observation count, with partial observations recorded fractionally">
##INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele observations, with partial observations recorded fractionally">
##INFO=<ID=QR,Number=1,Type=Integer,Description="Reference allele quality sum in phred">
##INFO=<ID=QA,Number=A,Type=Integer,Description="Alternate allele quality sum in phred">
##INFO=<ID=PQR,Number=1,Type=Float,Description="Reference allele quality sum in phred for partial observations">
##INFO=<ID=PQA,Number=A,Type=Float,Description="Alternate allele quality sum in phred for partial observations">
##INFO=<ID=SRF,Number=1,Type=Integer,Description="Number of reference observations on the forward strand">
##INFO=<ID=SRR,Number=1,Type=Integer,Description="Number of reference observations on the reverse strand">
##INFO=<ID=SAF,Number=A,Type=Integer,Description="Number of alternate observations on the forward strand">
##INFO=<ID=SAR,Number=A,Type=Integer,Description="Number of alternate observations on the reverse strand">
##INFO=<ID=SRP,Number=1,Type=Float,Description="Strand balance probability for the reference allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SRF and SRR given E(SRF/SRR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=SAP,Number=A,Type=Float,Description="Strand balance probability for the alternate allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SAF and SAR given E(SAF/SAR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=AB,Number=A,Type=Float,Description="Allele balance at heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous">
##INFO=<ID=ABP,Number=A,Type=Float,Description="Allele balance probability at heterozygous sites: Phred-scaled upper-bounds estimate of the probability of observing the deviation between ABR and ABA given E(ABR/ABA) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=RUN,Number=A,Type=Integer,Description="Run length: the number of consecutive repeats of the alternate allele in the reference genome">
##INFO=<ID=RPP,Number=A,Type=Float,Description="Read Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=RPPR,Number=1,Type=Float,Description="Read Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=RPL,Number=A,Type=Float,Description="Reads Placed Left: number of reads supporting the alternate balanced to the left (5') of the alternate allele">
##INFO=<ID=RPR,Number=A,Type=Float,Description="Reads Placed Right: number of reads supporting the alternate balanced to the right (3') of the alternate allele">
##INFO=<ID=EPP,Number=A,Type=Float,Description="End Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=EPPR,Number=1,Type=Float,Description="End Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=DPRA,Number=A,Type=Float,Description="Alternate allele depth ratio.  Ratio between depth in samples with each called alternate allele and those without.">
##INFO=<ID=ODDS,Number=1,Type=Float,Description="The log odds ratio of the best genotype combination to the second-best.">
##INFO=<ID=GTI,Number=1,Type=Integer,Description="Number of genotyping iterations required to reach convergence or bailout.">
##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="The extended CIGAR representation of each alternate allele, with the exception that '=' is replaced by 'M' to ease VCF parsing.  Note that INDEL alleles do not have the first matched base (which is provided by default, per the spec) referred to by the CIGAR.">
##INFO=<ID=NUMALT,Number=1,Type=Integer,Description="Number of unique non-reference alleles in called genotypes at this position.">
##INFO=<ID=MEANALT,Number=A,Type=Float,Description="Mean number of unique non-reference allele observations per sample with the corresponding alternate alleles.">
##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
##INFO=<ID=MQM,Number=A,Type=Float,Description="Mean mapping quality of observed alternate alleles">
##INFO=<ID=MQMR,Number=1,Type=Float,Description="Mean mapping quality of observed reference alleles">
##INFO=<ID=PAIRED,Number=A,Type=Float,Description="Proportion of observed alternate alleles which are supported by properly paired read fragments">
##INFO=<ID=PAIREDR,Number=1,Type=Float,Description="Proportion of observed reference alleles which are supported by properly paired read fragments">
##INFO=<ID=technology.ILLUMINA,Number=A,Type=Float,Description="Fraction of observations supporting the alternate observed in reads from ILLUMINA">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12878.WEX.100bp30x.PE.ILLUMINA.no-trim.bwa.hg19
chrM    73      .       G       A       23.7578 .       AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=5.46559;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=44;QR=0;RO=0;RPL=2;RPP=7.35324;RPPR=0;RPR=0;RUN=1;SAF=2;SAP=7.35324;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp;t
chrM    150     .       TCT     CCC     211.135 .       AB=0;ABP=0;AC=2;AF=1;AN=2;AO=7;CIGAR=1X1M1X;DP=7;DPB=7;DPRA=0;EPP=3.32051;EPPR=0;GTI=0;LEN=3;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=14.3092;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=252;QR=0;RO=0;RPL=4;RPP=3.32051;RPPR=0;RPR=3;RUN=1;SAF=2;SAP=5.80219;SAR=5;SRF=0;SRP=0;SRR=0;TYPE=

Jul 19 '15 08:07 snewhouse

ADD svtools

https://github.com/hall-lab/svtools

Jul 19 '15 10:07 snewhouse

Not a strict NGSeasy item but setup Elasticluster on Rosalind, https://elasticluster.readthedocs.org/en/latest/

Jul 21 '15 09:07 afolarin

get http://www.ncbi.nlm.nih.gov/assembly
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.15_GRCh38
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/README_analysis_sets.txt

Jul 30 '15 15:07 snewhouse

https://github.com/mozack/abra

ABRA - Assembly Based ReAligner Introduction

ABRA is a realigner for next generation sequencing data. It uses localized assembly and global realignment to align reads more accurately, thus improving downstream analysis (detection of indels and complex variants in particular).

Here is an ABRA realigned region (original reads on top, ABRA realigned reads on bottom). The original set of reads have rather "noisy" alignments with several variations from the reference and a fair bit of high quality soft clipping. The ABRA realignments present a more parsimonious representation of the reads including a previously unobserved large deletion.

Sep 07 '15 13:09 snewhouse

bam to fastq

sort bam index bam then samtools?

Sep 08 '15 13:09 snewhouse

Gel

Illumina BAM and VCFs
recall pipeline
realign pipline

ISSAC BAMS? Do they have ALL reads - or does it chuck unmapped reads?

Sep 08 '15 13:09 snewhouse

automated version check and control flags

docker ngs tool build version ngseasy scripts version

Sep 13 '15 16:09 snewhouse

ngseasy
ngseasy copied to clipboard

to doz

Scratch pad

For 1.0 to 1.X

Future Dev

Browsers

Add decoy genomes

GATK 3

lobSTR

Abundant contribution of short tandem repeats to gene expression variation in humans

Abstract

support non-freebayes/GATK callers by preprocessing

Platypus uses its own VCF nomenclature: TC == DP, FR == AF

Freebayes vcf header

ADD svtools

bam to fastq

Gel

ngseasy ngseasy copied to clipboard

to doz

Scratch pad

For 1.0 to 1.X

Future Dev

Browsers

Add decoy genomes

GATK 3

lobSTR

Abundant contribution of short tandem repeats to gene expression variation in humans

Abstract

support non-freebayes/GATK callers by preprocessing

Platypus uses its own VCF nomenclature: TC == DP, FR == AF

Freebayes vcf header

ADD svtools

bam to fastq

Gel

ngseasy
ngseasy copied to clipboard