cyvcf2 icon indicating copy to clipboard operation
cyvcf2 copied to clipboard

Building a new Variant without an input VCF

Open nh13 opened this issue 4 years ago • 9 comments

I'd like to build a new Variant without and input VCF, and then write it to a file. I was hoping to see such an example in the test classes, but they all seem to read from on-disk VCFs. Is there a way to do this?

As I write tools using cyvcf2, I want to unit test individual methods that may take a variant (type Variant) and do something. I don't want to use files on disk-to hydrate those input variants (that's a little brittle), so I'd like to create a Variant directly, but that seems like it is not possible.

Am I out of luck or am I missing something?

nh13 avatar Feb 14 '20 23:02 nh13

Probably the best way is via the Writer.from_string and Writer.variant_from_string methods.

nh13 avatar Feb 15 '20 00:02 nh13

yes, the from_string methods are the way to go. htslib doesn't (or at least pre 1.10 didn't) provide much help here and I didn't add any helpers. There is some nice precedent for this for hts-nim with some helpers by @mflevine here: https://github.com/mflevine/hts-nim-sugar/

brentp avatar Feb 15 '20 02:02 brentp

@brentp Would you accept a pull request with a simpler writing interface? Have written a wrapper in Python but could make it cython if you'd prefer.

chrissype avatar Mar 09 '20 14:03 chrissype

Ditto, on vacation for a week or so but want to make a PR myself, but happy to work with yours @chrissype.

nh13 avatar Mar 09 '20 14:03 nh13

I think a pure python set-up would be fine. Happy to review a PR.

brentp avatar Mar 09 '20 14:03 brentp

It's pretty compromised as I was aiming for write safety, due to how htslib behaves when you try and write variants that don't conform with the header in some way (i.e. by segfaulting). Hence it doesn't really fit the billing of cyvcf2 as a high-performance option.

chrissype avatar Mar 09 '20 15:03 chrissype

I’ll post one in the next week or so.

nh13 avatar Mar 09 '20 15:03 nh13

Did you get anywhere with this @nh13? I've got plenty of time on my hands now and have even done a bit of cython stuff in the meantime so could get an outline going.

In my Python-only version the design ended up being a bit messy, as I wanted a simple pythonic API. But this meant I had to mess around with writing to temp files whenever you wanted to update the header or write to a new file so it's not the neatest design going.

I also saw from some other issues that py2 support is necessary?

chrissype avatar Apr 30 '20 18:04 chrissype

If like me, you stumble upon this issue for unit test purposes, mocking the Variant is a good way to do this.

from unittest.mock import patch
from cyvcf2 import Variant

@patch("cyvcf2.Variant", autospec=True, create=True)
def test_posIsOne(mocked_variant):
    assert mocked_variant is Variant

    mocked_variant.POS = 1
    mocked_variant.REF = "G"

    assert mocked_variant.POS == 1
    assert mocked_variant.REF == "G"

You can obviously pass the mocked_variant into a function you are trying to test also.

mbhall88 avatar Jun 03 '20 04:06 mbhall88