sgn icon indicating copy to clipboard operation
sgn copied to clipboard

Genotype upload verification tool

Open lukasmueller opened this issue 5 years ago • 3 comments

Expected Behavior

After a genotype upload, need to verify that everything is correct in the database. By nd_protocol or by nd_file uploaded

should do summary statistics and spot checks

graphical viewer?

For Bugs:

Environment

Steps to Reproduce

lukasmueller avatar Apr 22 '19 14:04 lukasmueller

I looked at three tools to give statistics for vcf files.

  1. vcftools - only very basic summary
  2. bcftools - a little better still not much detail
  3. rtg-tools - gives statistics for each accession, requires chromosome and position for each entry of vcf file Would it be possible to add a feature to the download page so that you can get output in VCF format? Then you can compare it to input using rtg-tools or another method.

ClayBirkett avatar May 08 '19 17:05 ClayBirkett

@ClayBirkett the file is archived in the server's filesystem and the genotype data is also in the database. vcftools could be run on the vcf from a request on the website, but that adds vcftools as a dependency on the website. what kind of summary stats would vcftools give that we could write ourselves in perl by querying the genotype data from the database?

nickmorales avatar May 08 '19 18:05 nickmorales

@ClayBirkett @nickmorales summary stats on a vcf can be as various as the info fields provided in the input vcf. As a basic check up we may like to focus on -> missing data per individual -> missing data per marker -> ability to filter on minor frequency -> ability to filter on bi-allelic markers (vs multiallelic)

As these filters/stats are pretty simple and for code base "sustainability" it might be better to have this code natively rather than in dependencies (although vcftool and bcftools are C/perl libs). @ClayBirkett, plink is an other great lib for these activities

Other fields of interest but a bit more advanced in terms of computation and which would probably require vcftool/bcftools to be added to the dependencies: -> allele depth -> hardy weinberg filtering -> linkage disequilibrium pruning

vcf, bcftools can do many other types of filtering but some of them may simply crash, if vcf input format and version is not properly checked at upload step.

bauchetg avatar May 08 '19 18:05 bauchetg