SnpSift icon indicating copy to clipboard operation
SnpSift copied to clipboard

SnpSift split outputs only the first chromosome

Open kmavrommatis opened this issue 1 year ago • 5 comments

Hi, I have a 3.8Gb VCF file produced by a WGS pipeline mapping on hg38 and mutation calling using Mutect2. The file contains chromosomes chr1 ... chr22, chrX, chrY and chrM in that order.

Running java -jar SnpSift.jar split $PWD/sample.mnv.hg38.vcf.gz

produces a single file named

sample.mnv.hg38.chr1.vcf which contains only the first few hundreds of positions in chr1 and exists without any error

I have not managed to replicate the error with a smaller size vcf file but happy to share the full vcf file if necessary.

Thanks in advance for any advise/help

kmavrommatis avatar Dec 16 '22 22:12 kmavrommatis

I have a similar problem, though my VCF is 1TB, mapped to hg19, processed with GATK and also includes GL contigs. The sample.1.vcf output file contains ~8,500 variants and in total there are ~42,000,000 variants in the original VCF. Same result if trying the -l option to split every N lines.

emyli14 avatar Jan 16 '23 23:01 emyli14

I am having the exact same issue. Did you find any solution?

alejandrogzi avatar Feb 01 '23 20:02 alejandrogzi

I have the same issue, output only 124M of data and stopped without any error or output in the log (run on the cluster).

blueivy1117 avatar Feb 24 '23 04:02 blueivy1117

I am having the exact same issue. Did you find any solution?

yhkithub avatar Jul 18 '23 12:07 yhkithub

@kmavrommatis any idea why split does not work? It does not give any error either when trying to split a vcf files using the -l argument.

dianacornejo avatar Feb 05 '24 22:02 dianacornejo

I too have had the same problem!

alkaZeltser avatar May 09 '24 05:05 alkaZeltser