xqtl-protocol icon indicating copy to clipboard operation
xqtl-protocol copied to clipboard

Some snps are removed during standardize sumstat even comparing to themselves

Open hsun3163 opened this issue 2 years ago • 0 comments

There are two scenario of the lost of some snps

  1. by the cugg allele_qc function, exemplify by following msg
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 810: REF:ALT = ALT:REF. They will be removed.
  warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 810: REF:ALT = ALT:REF. They will be removed.
  warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 810: REF:ALT = ALT:REF. They will be removed.
  warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
  1. by the sumstat_standardizer TARGET generation procedure:
Total number of sumstats:  1
{'/mnt/vast/hpc/csg/molecular_phenotype_calling/k9_tensorQTL_results_new/h3k9ac_bed_recipe_h3k9ac_whole.k9_cov.xqtl_protocol_data.filtered.related.filtered.extracted.pca.projected.resid.PEER.merged.1.norminal.cis_long_table.txt': {'ID': 'GENE,CHR,POS,A0,A1', 'CHR': 'chrom', 'POS': 'pos', 'SNP': 'variant_id', 'A0': 'ref', 'A1': 'alt', 'STAT': 'beta', 'SE': 'se', 'P': 'pvalue', 'TSS_D': 'tss_distance', 'maf': 'maf', 'n': 'n', 'ma_samples': 'ma_samples', 'ac': 'ma_count', 'GENE': 'molecular_trait_id', 'molecular_trait_object_id': 'molecular_trait_object_id'}}
Total rows of query:  84395879 Total rows of subject:  84309393
/mnt/vast/hpc/csg/molecular_phenotype_calling/h3ack9_data_intergration/h3ack9_data_intergration.1/h3ack9_data_intergration.1.yml False False
Total number of sumstats:  1
{'/mnt/vast/hpc/csg/molecular_phenotype_calling/k9_tensorQTL_results_new/h3k9ac_bed_recipe_h3k9ac_whole.k9_cov.xqtl_protocol_data.filtered.related.filtered.extracted.pca.projected.resid.PEER.merged.1.norminal.cis_long_table.txt': {'ID': 'GENE,CHR,POS,A0,A1', 'CHR': 'chrom', 'POS': 'pos', 'SNP': 'variant_id', 'A0': 'ref', 'A1': 'alt', 'STAT': 'beta', 'SE': 'se', 'P': 'pvalue', 'TSS_D': 'tss_distance', 'maf': 'maf', 'n': 'n', 'ma_samples': 'ma_samples', 'ac': 'ma_count', 'GENE': 'molecular_trait_id', 'molecular_trait_object_id': 'molecular_trait_object_id'}}
Total rows of query:  84395879 Total rows of subject:  84309393
/mnt/vast/hpc/csg/molecular_phenotype_calling/h3ack9_data_intergration/h3ack9_data_intergration.1/h3ack9_data_intergration.1.yml False False
Total number of sumstats:  1
{'/mnt/vast/hpc/csg/molecular_phenotype_calling/k9_tensorQTL_results_new/h3k9ac_bed_recipe_h3k9ac_whole.k9_cov.xqtl_protocol_data.filtered.related.filtered.extracted.pca.projected.resid.PEER.merged.1.norminal.cis_long_table.txt': {'ID': 'GENE,CHR,POS,A0,A1', 'CHR': 'chrom', 'POS': 'pos', 'SNP': 'variant_id', 'A0': 'ref', 'A1': 'alt', 'STAT': 'beta', 'SE': 'se', 'P': 'pvalue', 'TSS_D': 'tss_distance', 'maf': 'maf', 'n': 'n', 'ma_samples': 'ma_samples', 'ac': 'ma_count', 'GENE': 'molecular_trait_id', 'molecular_trait_object_id': 'molecular_trait_object_id'}}
Total rows of query:  84395879 Total rows of subject:  84309393

Since the file are comparing to a TARGET generated based on themselves, the number of rows in query vs the number of rows in subject should be the same. But they are different.

hsun3163 avatar Oct 04 '22 16:10 hsun3163