coloc icon indicating copy to clipboard operation
coloc copied to clipboard

error in check_dataset: duplicated SNPs

Open PyunJung-Min opened this issue 1 year ago • 4 comments

Hi, thanks for the amazing R package. I am new to COLOC.

I'm trying to integrate GWAS summary data (dataset 1) and eQTL summary data (dataset 2). If i understood correctly, SNPs in dataset 1 and dataset 2 should be identical, is this right?

So i merged dataset 1 and dataset 2 by rsid. However, there are multiple ENSG genes matched to one SNP in eQTL summary data. So, the merged data (dataset1 and dataset 2) has many duplicated SNPs with differnet ENSG genes.

How can i deal with this problem? or am I wrong with dataset editing?

Many thanks in advance

Jungmin

PyunJung-Min avatar Aug 11 '23 04:08 PyunJung-Min

You need to analyse each gene separately, after all you are testing a separate colocalisation hypothesis for each gene.

-- https://chr1swallace.github.io


From: PyunJung-Min @.> Sent: Friday, August 11, 2023 5:12:13 AM To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] error in check_dataset: duplicated SNPs (Issue #128)

Hi, thanks for the amazing R package. I am new to COLOC.

I'm trying to integrate GWAS summary data (dataset 1) and eQTL summary data (dataset 2). If i understood correctly, SNPs in dataset 1 and dataset 2 should be identical, is this right?

So i merged dataset 1 and dataset 2 by rsid. However, there are multiple ENSG genes matched to one SNP in eQTL summary data. So, the merged data (dataset1 and dataset 2) has many duplicated SNPs with differnet ENSG genes.

How can i deal with this problem? or am I wrong with dataset editing?

Many thanks in advance

Jungmin

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/128, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2DW2EQDJVJW4SNFALLXUWWJ3ANCNFSM6AAAAAA3MMBY44. You are receiving this because you are subscribed to this thread.Message ID: @.***>

chr1swallace avatar Aug 11 '23 07:08 chr1swallace

Thanks for your prompt reply!!

Though my eQTL summary data has 19250 genes. Is there a smart way to analyse 19250 genes at once, instead of performing "coloc.abf" 19250 times?

Thanks!

Jung-Min

PyunJung-Min avatar Aug 11 '23 09:08 PyunJung-Min

sorry, no. but you probably don't want to run 19250 genes. You know whether each of them have a significant signal in your region of interest, so can discard the rest

chr1swallace avatar Aug 22 '23 10:08 chr1swallace

Thank you for the answer! :)

My goal using COLOC is identifying causal(target) genes by integrating GWAS summary data for disease and eQTL summary data. I like to select target genes with various p-value thresholds. That's why i tried to run COLOC with all 19250 genes..

Could you please advise how to solve this mission? I would appreciate any comment:) Many thanks

Jung-Min

PyunJung-Min avatar Aug 23 '23 03:08 PyunJung-Min