ngseasy
ngseasy copied to clipboard
to doz
Scratch pad
hs38 primary assembly of GRCh38 (incl. chromosomes, unplaced and unlocalized contigs) and EBV hs38a hs38 plus ALT contigs hs38DH hs38a plus decoy contigs and HLA genes (recommended for GRCh38 mapping) hs37 primary assembly of GRCh37 (used by 1000g phase 1) plus the EBV genome hs37d5 hs37 plus decoy contigs (used by 1000g phase 3)
For 1.0 to 1.X
ploidy options - freebayes SEX chrom calling XX XY Y lcr regions options config file reading in fixes config file order of options platypus options testing and tweaks test freebayes options - regions are mapped reads callable loci or just Chromsomes Improve logging clean up install biobambam
Future Dev
b38 indexes - GIAB b38 pipelines - GIAB gui - user install updates git AWS gce pricing registration for GATK and others capture bed files from companies recalling pipeline cohort pipeline cancer pipeline CNV pipeline Annotation pipeline PGRS pipeline plus reporting chanjo bcbio options : give user option for bcbio or speedseq test speedseq need parser to create options for calling bcbio or speedseq
Browsers
http://genomesavant.com/p/savant/index
ADD https://github.com/GregoryFaust/yaha
vt decompose -s $VCF | vt normalize -r $REFERENCE - > $NEW_VCF
change var calling to use above ; sub vcfallelicprimitives for vt decompose
see http://gemini.readthedocs.org/en/latest/content/preprocessing.html
# decompose, normalize and annotate VCF with snpEff.
# NOTE: can also swap snpEff with VEP
#NOTE: -classic and -formatEff flags needed with snpEff >= v4.1
zless $VCF \
| sed 's/ID=AD,Number=./ID=AD,Number=R/'
| vt decompose -s - \
| vt normalize -r $REF - \
| java -Xmx4G -jar $SNPEFFJAR -formatEff -classic GRCh37.75 \
| bgzip -c > $NORMVCF
tabix $NORMVCF
Add decoy genomes
EBV Human pathogens
Add SURPI
GATK 3
add options and scripts for HC gVCF stuff (1.5 or 2.0 release)
lobSTR
Abundant contribution of short tandem repeats to gene expression variation in humans
Melissa Gymrek, Thomas Willems, Haoyang Zeng, Barak Markus, Mark J Daly, Alkes L Price, Jonathan Pritchard, Yaniv Erlich
doi: http://dx.doi.org/10.1101/017459
Abstract
Expression quantitative trait loci (eQTLs) are a key tool to dissect cellular processes mediating complex diseases. However, little is known about the role of repetitive elements as eQTLs. We report a genome-wide survey of the contribution of Short Tandem Repeats (STRs), one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from linked SNPs and indels and found that eSTRs contribute 10%-15% of the cis-heritability mediated by all common variants. Functional genomic analyses showed that eSTRs are enriched in conserved regions, co-localize with regulatory elements, and are predicted to modulate histone modifications. Our results show that eSTRs provide a novel set of regulatory variants and highlight the contribution of repeats to the genetic architecture of quantitative human traits.
mention I doc that ngseasy is run as NON-ROOT within the container.
Docker 1.7 should have namespace
userns has been dropped from Docker 1.7. Trying to find out when it is penned in (hopefully v1.8)
better logging
cleanups
mr and mrs Fast
mrCaNaVaR
Alkan et al, “Personalized copy number and segmental duplication
maps using next-generation sequencing.”,
● Also used in:
○ Sudmant et al, Science, 2010; 1000 Genomes Project and SV companions
maps using next-generation sequencing.”, Nature Genetics, 2009
(2010,2011,2012,2015), Great Ape Diversity Project (Prado-Martinez et al, Nature, 2013),
several genome projects (gorilla, bonobo), Neandertal and Denisova projects, dog
domestication (Freedman et al, 2014), cat domestication (Montague et al, 2014 and
Tamazian et al 2014), maybe others...
MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.
http://mrcanavar.sourceforge.net/
mrCaNaVaR (mɪstər ʤʌnʌvʌr) is a copy number caller that analyzes the whole-genome next-generation sequence mapping read depth to discover large segmental duplications and deletions. It also has the capability of predicting absolute copy numbers of genomic intervals.
vt peek v0.5
description : Summarizes the variants in a VCF file
usage : vt peek [options] <in.vcf>
options : -y output pdf file [summary.pdf]
-x output latex directory []
-f filter expression []
-I file containing list of intervals []
-i intervals []
-r reference sequence fasta file []
-? displays help
https://github.com/hall-lab/speedseq/blob/master/example/example_speedseq_install.sh
support non-freebayes/GATK callers by preprocessing
https://github.com/arq5x/gemini/issues/409 https://gist.github.com/brentp/4db670df147cbd5a2b32
brentp / preprocess.py
https://gist.github.com/brentp/4db670df147cbd5a2b32#file-preprocess-py
Platypus uses its own VCF nomenclature: TC == DP, FR == AF
##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=2015-07-18
##source=Platypus_Version_0.8.1
##platypusOptions={'assemblyRegionSize': 1500, 'trimReadFlank': 0, 'assembleBadReads': 1, 'bamFiles': ['/home/pipeman/ngs_projects/GCAT_Data/NA12878/alignments/NA12878.WEX.100bp30x.PE.ILLUMINA.btrim.snap.hg19.filtered.bam'], 'minVarDist': 9, 'trimSoftClipped': 1, 'minReads': 2, 'qualBinSize': 1, 'refFile': '/home/pipeman/ngs_projects/ngseasy_
##filter="QUAL > 5"
##INFO=<ID=FR,Number=.,Type=Float,Description="Estimated population frequency of variant">
##INFO=<ID=MMLQ,Number=1,Type=Float,Description="Median minimum base quality for bases around variant">
##INFO=<ID=TCR,Number=1,Type=Integer,Description="Total reverse strand coverage at this locus">
##INFO=<ID=HP,Number=1,Type=Integer,Description="Homopolymer run length around variant locus">
##INFO=<ID=WE,Number=1,Type=Integer,Description="End position of calling window">
##INFO=<ID=Source,Number=.,Type=String,Description="Was this variant suggested by Playtypus, Assembler, or from a VCF?">
##INFO=<ID=FS,Number=.,Type=Float,Description="Fisher's exact test for strand bias (Phred scale)">
##INFO=<ID=WS,Number=1,Type=Integer,Description="Starting position of calling window">
##INFO=<ID=PP,Number=.,Type=Float,Description="Posterior probability (phred scaled) that this variant segregates">
##INFO=<ID=TR,Number=.,Type=Integer,Description="Total number of reads containing this variant">
##INFO=<ID=NF,Number=.,Type=Integer,Description="Total number of forward reads containing this variant">
##INFO=<ID=TCF,Number=1,Type=Integer,Description="Total forward strand coverage at this locus">
##INFO=<ID=NR,Number=.,Type=Integer,Description="Total number of reverse reads containing this variant">
##INFO=<ID=TC,Number=1,Type=Integer,Description="Total coverage at this locus">
##INFO=<ID=END,Number=.,Type=Integer,Description="End position of reference call block">
##INFO=<ID=MGOF,Number=.,Type=Integer,Description="Worst goodness-of-fit value reported across all samples">
##INFO=<ID=SbPval,Number=.,Type=Float,Description="Binomial P-value for strand bias test">
##INFO=<ID=START,Number=.,Type=Integer,Description="Start position of reference call block">
##INFO=<ID=ReadPosRankSum,Number=.,Type=Float,Description="Mann-Whitney Rank sum test for difference between in positions of variants in reads from ref and alt">
##INFO=<ID=MQ,Number=.,Type=Float,Description="Root mean square of mapping qualities of reads at the variant position">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant-quality/read-depth for this variant">
##INFO=<ID=SC,Number=1,Type=String,Description="Genomic sequence 10 bases either side of variant position">
##INFO=<ID=BRF,Number=1,Type=Float,Description="Fraction of reads around this variant that failed filters">
##INFO=<ID=HapScore,Number=.,Type=Integer,Description="Haplotype score measuring the number of haplotypes the variant is segregating into in a window">
##INFO=<ID=Size,Number=.,Type=Integer,Description="Size of reference call block">
##FILTER=<ID=GOF,Description="Variant fails goodness-of-fit test.">
##FILTER=<ID=badReads,Description="Variant supported only by reads with low quality bases close to variant position, and not present on both strands.">
##FILTER=<ID=alleleBias,Description="Variant frequency is lower than expected for het">
##FILTER=<ID=hp10,Description="Flanking sequence contains homopolymer of length 10 or greater">
##FILTER=<ID=Q20,Description="Variant quality is below 20.">
##FILTER=<ID=HapScore,Description="Too many haplotypes are supported by the data in this region.">
##FILTER=<ID=MQ,Description="Root-mean-square mapping quality across calling region is low.">
##FILTER=<ID=strandBias,Description="Variant fails strand-bias filter">
##FILTER=<ID=SC,Description="Variants fail sequence-context filter. Surrounding sequence is low-complexity">
##FILTER=<ID=QualDepth,Description="Variant quality/Read depth ratio is low.">
##FILTER=<ID=REFCALL,Description="This line represents a homozygous reference call">
##FILTER=<ID=QD,Description="Variants fail quality/depth filter.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Unphased genotypes">
##FORMAT=<ID=GQ,Number=.,Type=Integer,Description="Genotype quality as phred score">
##FORMAT=<ID=GOF,Number=.,Type=Float,Description="Goodness of fit value">
##FORMAT=<ID=NR,Number=.,Type=Integer,Description="Number of reads covering variant location in this sample">
##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype log10-likelihoods for AA,AB and BB genotypes, where A = ref and B = variant. Only applicable for bi-allelic sites">
##FORMAT=<ID=NV,Number=.,Type=Integer,Description="Number of reads containing variant in this sample">
##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=OLD_VARIANT,Number=1,Type=String,Description="Original chr:pos:ref:alt encoding">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878.WEX.100bp30x.PE.ILLUMINA.btrim.snap.hg19
chr1 14653 . C T 96 PASS AC=1;AF=0.5;AN=2;BRF=0;FR=0.5;HP=2;HapScore=2;MGOF=2;MMLQ=40;MQ=34.13;NF=4;NR=0;NS=1;PP=96;QD=32.6928;SC=GTCAGAGCAACGGCCCAAGTC;SbPval=1;Source=Platypus,Assembler;TC=11;TCF=9;TCR=2;TR=4;WE=14661;WS=14643 GT:GL:GOF:GQ:NR:NV 0/1:-13.69,0,-15.39:2:99:11:4
chr1 14907 . A G 1798 PASS AC=1;AF=0.5;AN=2;BRF=0;FR=0.5;HP=1;HapScore=1;MGOF=26;MMLQ=35;MQ=39.8;NF=34;NR=23;NS=1;PP=1798;QD=32.1539;SC=AAATACAGGAAGAAAAAGGCA;SbPval=0.65;Source=Platypus,Assembler;TC=63;TCF=36;TCR=27;TR=57;WE=14915;WS=14897 GT:GL:GOF:GQ:NR:NV 0/1:-183.84,0,-4.54:26:45:63:57
chr1 14930 . A G 2288 PASS AC=2;AF=1;AN=2;BRF=0;FR=1;HP=1;HapScore=1;MGOF=24;MMLQ=31;MQ=39.56;NF=43;NR=24;NS=1;PP=2288;QD=34.6682;SC=ACAGAATTACAAGGTGCTGGC;SbPval=0.52;Source=Platypus,Assembler;TC=70;TCF=45;TCR=25;TR=67;WE=14938;WS=14920 GT:GL:GOF:GQ:NR:NV 1/1:-232.3,-10.97,0:24:99:70:67
chr1 15118 . A G 288 PASS AC=1;AF=0.5;AN=2;BRF=0;FR=0.5006;HP=2;HapScore=2;MGOF=0;MMLQ=33;MQ=28.68;NF=1;NR=9;NS=1;PP=288;QD=32.2771;SC=CCCCCATGACACTCCCCAGCC;SbPval=0.58;Source=Platypus,Assembler;TC=12;TCF=1;TCR=11;TR=10;WE=15126;WS=15108 GT:GL:GOF:GQ:NR:NV 0/1:-32.89,0,-2.89:0:29:12:10
chr1 16495 . G C 79 PASS AC=1;AF=0.5;AN=2;BRF=0;FR=0.5;HP=1;HapScore=2;MGOF=20;MMLQ=35;MQ=25.13;NF=4;NR=0;NS=1;PP=79;QD=28.4428;SC=TATTTGAAATGGAAACTATTC;SbPval=1;Source=Platypus,Assembler;TC=11;TCF=10;TCR=1;TR=4;WE=16503;WS=16485 GT:GL:GOF:GQ:NR:NV 1/0:-11.99,0,-22.69:20:99:11:4
Freebayes vcf header
##fileformat=VCFv4.1
##fileDate=20150719
##source=freeBayes v0.9.21-19-gc003c1e
##reference=/home/pipeman/ngs_projects/ngseasy_resources/reference_genomes_hg19/ucsc.hg19.fasta
##phasing=none
##commandline="freebayes -f /home/pipeman/ngs_projects/ngseasy_resources/reference_genomes_hg19/ucsc.hg19.fasta -b /home/pipeman/ngs_projects/GCAT_Data/NA12878/alignments/NA12878.WEX.100bp30x.PE.ILLUMINA.no-trim.bwa.hg19.filtered.bam --min-coverage 2 --min-mapping-quality 20 --min-base-quality 20 --min-repeat-entropy 1 --genotype-qualities --
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus">
##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count, with partial observations recorded fractionally">
##INFO=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observations, with partial observations recorded fractionally">
##INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele observation count, with partial observations recorded fractionally">
##INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele observations, with partial observations recorded fractionally">
##INFO=<ID=QR,Number=1,Type=Integer,Description="Reference allele quality sum in phred">
##INFO=<ID=QA,Number=A,Type=Integer,Description="Alternate allele quality sum in phred">
##INFO=<ID=PQR,Number=1,Type=Float,Description="Reference allele quality sum in phred for partial observations">
##INFO=<ID=PQA,Number=A,Type=Float,Description="Alternate allele quality sum in phred for partial observations">
##INFO=<ID=SRF,Number=1,Type=Integer,Description="Number of reference observations on the forward strand">
##INFO=<ID=SRR,Number=1,Type=Integer,Description="Number of reference observations on the reverse strand">
##INFO=<ID=SAF,Number=A,Type=Integer,Description="Number of alternate observations on the forward strand">
##INFO=<ID=SAR,Number=A,Type=Integer,Description="Number of alternate observations on the reverse strand">
##INFO=<ID=SRP,Number=1,Type=Float,Description="Strand balance probability for the reference allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SRF and SRR given E(SRF/SRR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=SAP,Number=A,Type=Float,Description="Strand balance probability for the alternate allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SAF and SAR given E(SAF/SAR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=AB,Number=A,Type=Float,Description="Allele balance at heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous">
##INFO=<ID=ABP,Number=A,Type=Float,Description="Allele balance probability at heterozygous sites: Phred-scaled upper-bounds estimate of the probability of observing the deviation between ABR and ABA given E(ABR/ABA) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=RUN,Number=A,Type=Integer,Description="Run length: the number of consecutive repeats of the alternate allele in the reference genome">
##INFO=<ID=RPP,Number=A,Type=Float,Description="Read Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=RPPR,Number=1,Type=Float,Description="Read Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=RPL,Number=A,Type=Float,Description="Reads Placed Left: number of reads supporting the alternate balanced to the left (5') of the alternate allele">
##INFO=<ID=RPR,Number=A,Type=Float,Description="Reads Placed Right: number of reads supporting the alternate balanced to the right (3') of the alternate allele">
##INFO=<ID=EPP,Number=A,Type=Float,Description="End Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=EPPR,Number=1,Type=Float,Description="End Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">
##INFO=<ID=DPRA,Number=A,Type=Float,Description="Alternate allele depth ratio. Ratio between depth in samples with each called alternate allele and those without.">
##INFO=<ID=ODDS,Number=1,Type=Float,Description="The log odds ratio of the best genotype combination to the second-best.">
##INFO=<ID=GTI,Number=1,Type=Integer,Description="Number of genotyping iterations required to reach convergence or bailout.">
##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="The extended CIGAR representation of each alternate allele, with the exception that '=' is replaced by 'M' to ease VCF parsing. Note that INDEL alleles do not have the first matched base (which is provided by default, per the spec) referred to by the CIGAR.">
##INFO=<ID=NUMALT,Number=1,Type=Integer,Description="Number of unique non-reference alleles in called genotypes at this position.">
##INFO=<ID=MEANALT,Number=A,Type=Float,Description="Mean number of unique non-reference allele observations per sample with the corresponding alternate alleles.">
##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
##INFO=<ID=MQM,Number=A,Type=Float,Description="Mean mapping quality of observed alternate alleles">
##INFO=<ID=MQMR,Number=1,Type=Float,Description="Mean mapping quality of observed reference alleles">
##INFO=<ID=PAIRED,Number=A,Type=Float,Description="Proportion of observed alternate alleles which are supported by properly paired read fragments">
##INFO=<ID=PAIREDR,Number=1,Type=Float,Description="Proportion of observed reference alleles which are supported by properly paired read fragments">
##INFO=<ID=technology.ILLUMINA,Number=A,Type=Float,Description="Fraction of observations supporting the alternate observed in reads from ILLUMINA">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878.WEX.100bp30x.PE.ILLUMINA.no-trim.bwa.hg19
chrM 73 . G A 23.7578 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=5.46559;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=44;QR=0;RO=0;RPL=2;RPP=7.35324;RPPR=0;RPR=0;RUN=1;SAF=2;SAP=7.35324;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp;t
chrM 150 . TCT CCC 211.135 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=7;CIGAR=1X1M1X;DP=7;DPB=7;DPRA=0;EPP=3.32051;EPPR=0;GTI=0;LEN=3;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=14.3092;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=252;QR=0;RO=0;RPL=4;RPP=3.32051;RPPR=0;RPR=3;RUN=1;SAF=2;SAP=5.80219;SAR=5;SRF=0;SRP=0;SRR=0;TYPE=
ADD svtools
https://github.com/hall-lab/svtools
Not a strict NGSeasy item but setup Elasticluster on Rosalind, https://elasticluster.readthedocs.org/en/latest/
get http://www.ncbi.nlm.nih.gov/assembly
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.15_GRCh38
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/README_analysis_sets.txt
https://github.com/mozack/abra
ABRA - Assembly Based ReAligner Introduction
ABRA is a realigner for next generation sequencing data. It uses localized assembly and global realignment to align reads more accurately, thus improving downstream analysis (detection of indels and complex variants in particular).
Here is an ABRA realigned region (original reads on top, ABRA realigned reads on bottom). The original set of reads have rather "noisy" alignments with several variations from the reference and a fair bit of high quality soft clipping. The ABRA realignments present a more parsimonious representation of the reads including a previously unobserved large deletion.
bam to fastq
sort bam index bam then samtools?
Gel
- Illumina BAM and VCFs
- recall pipeline
- realign pipline
ISSAC BAMS? Do they have ALL reads - or does it chuck unmapped reads?
automated version check and control flags
docker ngs tool build version ngseasy scripts version