Jasmine icon indicating copy to clipboard operation
Jasmine copied to clipboard

Support for vcf.gz files

Open fgvieira opened this issue 3 years ago • 3 comments

It seems Jasmine does not support vcf.gz or bcf files:

Warning: input.vcf.gz ends with .gz, but (b)gzipped VCFs are not accepted
Exception in thread "main" java.lang.Exception: input.vcf.gz is a gzipped file, but only unzipped VCFs are accepted

Since it is quite a standard format, would it be possible for Jasmine to support both vcf.gz and bcf files? thanks,

fgvieira avatar Mar 09 '22 10:03 fgvieira

Hi,

Thanks for the suggestion! Unfortunately, adding support for vcf.gz and .bcf files would require fairly extensive software changes and so there are no plans in the near future to do so since the majority of SV calling software produces unzipped VCF files.

Melanie

mkirsche avatar Mar 30 '22 20:03 mkirsche

I understand that it might a bit of work, but maybe you could use an existing library to read the VCF files, like htsjdk (developed by the Broad Institute).

At this point it has only partial support for VCF (VCFv4.3 can be read but not written and there is no support for BCFv2.2), but at least you can read and write VCFv4.2 (both text and gz versions). And when they implement the rest Jasmine will automatically support them!

fgvieira avatar Mar 31 '22 14:03 fgvieira

I would like to bump this. Unzipping VCFs for large datasets is highly undesirable in terms of storage costs. Most bioinformatic tools are able to operate off of either compressed VCFs or some other lightweight binary format, which limits the reusability of the unzipped VCFs. Compression or binary support would be very much appreciated!

tnguyengel avatar Apr 17 '24 13:04 tnguyengel