PyVCF
PyVCF copied to clipboard
bcf compatibility
I have been playing around with pyVCF and found I was able to read in bcf files by adding the following lines to the reader object:
if fsock:
self._reader = fsock
if filename is None and hasattr(fsock, 'name'):
filename = fsock.name
compressed = compressed or filename.endswith('.gz')
# <-- New Lines -->
# if filetype is bcf, read it in using subprocess.
elif fsock.endswith(".bcf"):
self._reader = subprocess.Popen(['bcftools','view', fsock],stdout=subprocess.PIPE).stdout
# <-- End addition -->
elif filename:
compressed = compressed or filename.endswith('.gz')
self._reader = open(filename, 'rb' if compressed else 'rt')
self.filename = filename
if compressed:
self._reader = gzip.GzipFile(fileobj=self._reader)
if sys.version > '3':
self._reader = codecs.getreader('ascii')(self._reader)
bcftools is required to get this to work. What are peoples thoughts on using bcftools to read in bcf format data? A few things that could be added:
- Checking whether bcftools is available
- Ensuring the bcf file is indexed
I think this would be a great improvement! More and more people are moving to BCF, and I think bcftools is common enough (and easy to install) that it wouldn't present much of a barrier.
In the meantime, thanks for providing this workaround!