xqtl-protocol
xqtl-protocol copied to clipboard
Some snps are removed during standardize sumstat even comparing to themselves
There are two scenario of the lost of some snps
- by the cugg allele_qc function, exemplify by following msg
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 810: REF:ALT = ALT:REF. They will be removed.
warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 810: REF:ALT = ALT:REF. They will be removed.
warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 810: REF:ALT = ALT:REF. They will be removed.
warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
- by the sumstat_standardizer TARGET generation procedure:
Total number of sumstats: 1
{'/mnt/vast/hpc/csg/molecular_phenotype_calling/k9_tensorQTL_results_new/h3k9ac_bed_recipe_h3k9ac_whole.k9_cov.xqtl_protocol_data.filtered.related.filtered.extracted.pca.projected.resid.PEER.merged.1.norminal.cis_long_table.txt': {'ID': 'GENE,CHR,POS,A0,A1', 'CHR': 'chrom', 'POS': 'pos', 'SNP': 'variant_id', 'A0': 'ref', 'A1': 'alt', 'STAT': 'beta', 'SE': 'se', 'P': 'pvalue', 'TSS_D': 'tss_distance', 'maf': 'maf', 'n': 'n', 'ma_samples': 'ma_samples', 'ac': 'ma_count', 'GENE': 'molecular_trait_id', 'molecular_trait_object_id': 'molecular_trait_object_id'}}
Total rows of query: 84395879 Total rows of subject: 84309393
/mnt/vast/hpc/csg/molecular_phenotype_calling/h3ack9_data_intergration/h3ack9_data_intergration.1/h3ack9_data_intergration.1.yml False False
Total number of sumstats: 1
{'/mnt/vast/hpc/csg/molecular_phenotype_calling/k9_tensorQTL_results_new/h3k9ac_bed_recipe_h3k9ac_whole.k9_cov.xqtl_protocol_data.filtered.related.filtered.extracted.pca.projected.resid.PEER.merged.1.norminal.cis_long_table.txt': {'ID': 'GENE,CHR,POS,A0,A1', 'CHR': 'chrom', 'POS': 'pos', 'SNP': 'variant_id', 'A0': 'ref', 'A1': 'alt', 'STAT': 'beta', 'SE': 'se', 'P': 'pvalue', 'TSS_D': 'tss_distance', 'maf': 'maf', 'n': 'n', 'ma_samples': 'ma_samples', 'ac': 'ma_count', 'GENE': 'molecular_trait_id', 'molecular_trait_object_id': 'molecular_trait_object_id'}}
Total rows of query: 84395879 Total rows of subject: 84309393
/mnt/vast/hpc/csg/molecular_phenotype_calling/h3ack9_data_intergration/h3ack9_data_intergration.1/h3ack9_data_intergration.1.yml False False
Total number of sumstats: 1
{'/mnt/vast/hpc/csg/molecular_phenotype_calling/k9_tensorQTL_results_new/h3k9ac_bed_recipe_h3k9ac_whole.k9_cov.xqtl_protocol_data.filtered.related.filtered.extracted.pca.projected.resid.PEER.merged.1.norminal.cis_long_table.txt': {'ID': 'GENE,CHR,POS,A0,A1', 'CHR': 'chrom', 'POS': 'pos', 'SNP': 'variant_id', 'A0': 'ref', 'A1': 'alt', 'STAT': 'beta', 'SE': 'se', 'P': 'pvalue', 'TSS_D': 'tss_distance', 'maf': 'maf', 'n': 'n', 'ma_samples': 'ma_samples', 'ac': 'ma_count', 'GENE': 'molecular_trait_id', 'molecular_trait_object_id': 'molecular_trait_object_id'}}
Total rows of query: 84395879 Total rows of subject: 84309393
Since the file are comparing to a TARGET generated based on themselves, the number of rows in query vs the number of rows in subject should be the same. But they are different.