snps icon indicating copy to clipboard operation
snps copied to clipboard

VCF / GVCF parsing issue

Open deniseho-98 opened this issue 5 years ago • 5 comments

Hi. Am trying out the codes. Understand that the raw files here mean genotype files from DTC companies. However, what I have is only vcf and gvcf files from in-house sequencing platform. May I know can these be used? How? I have zero knowledge in coding. Thanks a lot!

deniseho-98 avatar Feb 19 '20 08:02 deniseho-98

Computed cMs in shared_DNA_one_chrom.excel. Total cMs equals to 3600, in which DNA painter shows parent/child relationship. But the two persons are totally unrelated. Anyone could help?

deniseho-98 avatar Feb 22 '20 03:02 deniseho-98

Hi, thanks for the note. Yes, VCF and GVCF files should work if the SNPs are annotated with RSID. The files can be loaded like shown in the examples.

As for the issue with total shared cMs = 3600, was this for output generated by lineage?

apriha avatar Feb 25 '20 05:02 apriha

Dear Apriha/Lineage,

Thank you so much for replying.

The output by lineage is an excel, one of the columns is the cMs (please see attached). The total shared cMs is calculated by adding all the values, no? That's how I obtained ~3600cMs. When I entered this value on DNApainter, it showed parent/child relationship, when the two individuals are actually husband and wife.

Can you please help?

Thank you.

Sincerely, Chai San

On Tue, Feb 25, 2020 at 1:01 PM Andrew Riha [email protected] wrote:

Hi, thanks for the note. Yes, VCF and GVCF files should work if the SNPs are annotated with RSID. The files can be loaded like shown in the examples https://lineage.readthedocs.io/en/latest/readme.html#load-raw-data.

As for the issue with total shared cMs = 3600, was this for output generated by lineage?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apriha/lineage/issues/75?email_source=notifications&email_token=AOTD7RQO3DPWPYOY376BGQ3RESQZ7A5CNFSM4KXUPKQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2SDVY#issuecomment-590684631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTD7RSQRGEUVIV2YZDOZRTRESQZ7ANCNFSM4KXUPKQA .

deniseho-98 avatar Feb 25 '20 09:02 deniseho-98

Hi Chai, sorry, the file didn't come through. But yes, you're correct that the total shared cMs is calculated by adding all values in the cMs column of the shared_dna output files.

But I suspect that lineage is finding matches due to the way that the genotype is parsed by snps in the VCF / GVCF... You said that this was an in-house sequencing platform - is that also creating the VCF / GVCF? Thanks again.

apriha avatar Mar 29 '20 18:03 apriha

The package accepts VCF files, but has not been tested on gVCF files! You will most likely run out of RAM if you try and load a gVCF.

On 29 March 2020 at 19:49:25, deniseho-98 ([email protected]) wrote:

Hi. Am trying out the codes. Understand that the raw files here mean genotype files from DTC companies. However, what I have is only vcf and gvcf files from in-house sequencing platform. May I know can these be used? How? I have zero knowledge in coding. Thanks a lot!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/apriha/snps/issues/67, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAND4KFSS3CKNXLACK36U5TRJ6J3LANCNFSM4LWCOX3Q .

willgdjones avatar Mar 29 '20 19:03 willgdjones