VCF2Dis
VCF2Dis copied to clipboard
some issue about nan
Hello, I am trying to construct a tree using the VCF file obtained from merging with GATK and bcftools. After using the VCF2Dis command to generate a .mat file, but there are many '-nan' in the result. Can you give me some suggestions?
You have too few vcf sites and too many miss genotypes. It is recommended to generate vcf directly from gvcf merging instead of bcftools merging, because this will cause many sites to be missed.
老师您好,我也出现了相同的问题。产生的.mat文件中有许多的nan,导致现在无法建树。这是我的代码: #1.合并所有的gvcf并进行joint callling
查找所有的 GVCF 文件
#gvcf_files=$(find $gvcfgz_dir -type f -name "*.gvcf.gz")
构建输入文件列表并执行 Sentieon 命令
#$SENTIEON_INSTALL_DIR/bin/sentieon driver -t $nt -r $reference
#--algo GVCFtyper
#$(for file in $gvcf_files; do echo -n "-v $file "; done)
#$output_vcf/${name_merged}.vcf
2.对合并的vcf文件SelectVariants-提取 SNPs
#gatk --java-options "-Xmx50g" SelectVariants -R $reference -select-type-to-include SNP -V $output_vcf/${name_merged}.vcf -O $output_vcf/${name_merged}.snp.vcf
3.VariantFiltration SNP 硬过滤,并去除低质量的 SNP(也就是有SNP_Filter标记的行)
#gatk --java-options "-Xmx50g" VariantFiltration -V $output_vcf/${name_merged}.snp.vcf --filter-expression 'QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0' --filter-name 'SNP_Filter' -O $output_vcf/${name_merged}.snp.filtering.vcf
#less $output_vcf/${name_merged}.snp.filtering.vcf | grep -v "SNP_Filter" > $output_vcf/${name_merged}.snp.filtered.vcf
4.vcftools再过滤
vcftools --vcf $output_vcf/${name_merged}.snp.filtered.vcf --max-missing 0.2 --minQ 30 --remove-indels --min-alleles 2 --max-alleles 2 --maf 0.05 --recode --recode-INFO-all --out $output_vcf/${name_merged}.snp.filtered.miss0.2maf0.05.vcf
请问是我的vcftools过滤条件的问题吗?