Monopogen icon indicating copy to clipboard operation
Monopogen copied to clipboard

About phased.vcf.gz generation during germline calling

Open scg-dgist opened this issue 2 years ago • 6 comments

When I conducted the germline calling process, I noticed that only a limited number of chromosomes were successfully processed into 'phased.vcf.gz,' whereas most of the chromosomes remained unprocessed. I included all standard chromosomes (chromosomes 1-22) as listed in 'region.lst' and utilized the GRCh38 human reference FASTA file for this analysis. Could this variation in success be linked to the inherent low read depth typically associated with 10x scRNA-seq data? Additionally, I'm interested in knowing if there are any potential solutions to address this issue.

Many thanks.

scg-dgist avatar Sep 11 '23 05:09 scg-dgist

Based on our testing, the phasing step should work for most of single cell sequencing platform (even only 100 cells included). Could you let me know how many SNVs in the chromosome that was successful in phasing step? Are they in the chromosome with large size (such as chr1, 2 etc).

jinzhuangdou avatar Sep 11 '23 06:09 jinzhuangdou

Based on our testing, the phasing step should work for most of single cell sequencing platform (even only 100 cells included). Could you let me know how many SNVs in the chromosome that was successful in phasing step? Are they in the chromosome with large size (such as chr1, 2 etc).

Thank you for the prompt response. The successfully called SNV file corresponds to chromosomes 12, 17, 18, and 22, out of the total 22 chromosomes.

scg-dgist avatar Sep 11 '23 07:09 scg-dgist

Could you share chr20.gl.vcf.gz file with me so that I can take a look at why phasing step failed?

jinzhuangdou avatar Sep 12 '23 16:09 jinzhuangdou

Could you share chr20.gl.vcf.gz file with me so that I can take a look at why phasing step failed?

Oh, I have solved the problem. The issue was with the panel VCF file. I find it confusing that when I used the "CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz" from your GitHub repository (located in the example directory), the process proceeded successfully. However, when I downloaded the same file from "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/" (as suggested in your paper), it did not generate the phased.vcf.gz file. Could you please explain if there are additional steps I need to take after downloading the phased.vcf.gz files from the public 1000Genomes project database?

I appreciate your invaluable assistance.

scg-dgist avatar Sep 13 '23 12:09 scg-dgist

Did it generate the .gp.vcf.gz file? If not, could you share the command located in folder ./Script/runGermline_.sh? There is the full command lines enabling us to debug the issue.

jinzhuangdou avatar Sep 14 '23 17:09 jinzhuangdou

I have exactly the same problem, germline SNV calling runs smoothly but the chr.phased.vcf.gz files are not generated. However when I ran the test with the reference you provided "CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz" everything works perfectly.

Thanks for your help

Alexioner8 avatar Jan 17 '25 09:01 Alexioner8