htsjdk
htsjdk copied to clipboard
new class: VCFReaderFactory a factory for VCFReader
Description
This PR adds a file VCFReaderFactory that open a VCFReader using a few methods.
VCFReader open(final File vcfFile,boolean requiredIndex);
VCFReader open(final Path vcfFile,boolean requiredIndex);
VCFReader open(final String vcfFile,boolean requiredIndex);
the aim is to provide a generic factory opening instance of VCFReaders instead of opening directltly a VCFReader. This gives a chance to decode variants files that are not VCF (database, multiple VCF, etc...). The default implementation of VCFReaderFactory returns an instance of VCFFileReader.
A new property CUSTOM_VCF_READER_FACTORY in Defaults
is used to define the class of the default implementation.
I added a few tests in VCFReaderFactoryTest that are mostly a copy of VCFReaderTest
there are some problem with travis but I'm not sure it comes from my code
Things to think about before submitting:
- [ ] Make sure your changes compile and new tests pass locally.
- [X] Add new tests or update existing ones:
- A bug fix should include a test that previously would have failed and passes now.
- New features should come with new tests that exercise and validate the new functionality.
- [X] Extended the README / documentation, if necessary
- [X] Check your code style.
- [X] Write a clear commit title and message
- The commit message should describe what changed and is targeted at htsjdk developers
- Breaking changes should be mentioned in the commit message.
@lindenb This is a very reasonable thing to want to do. I think it has pretty strong overlap with what @cmnbroad is working on in the CZI grant work right now to enable codec discovery and support multiple input sources fo things like vcf.
I'm not really a fan of injecting a single custom factory class through an environment variable. I know we've done it before for Reads but it was kind of gross then as well. It's just not very flexible if you for instance want to load from something like mixed sources. Registering dynamically loadable codecs in the manifest and then picking the right one automatically is something we're actively working on right now.
I assume you have a particular alternate VariantContext record sources you want to use? Is there any way to identify what type of source it is by the URI you access it at? We're currently building a system to figure this out for dynamically loaded codecs and a real world vcf example could be really useful.
Maybe you could be a beta tester for the new codec api?
@cmnbroad , thanks for reviewing
I assume you have a particular alternate VariantContext record sources you want to use?
yes for now I've got a wrapper over bcftools (reading BCF) and I'll write another one reading plink files.
Is there any way to identify what type of source it is by the URI you access it at?
yes, I could use bcftools:///path/to/my/bcf
for example
We're currently building a system to figure this out for dynamically loaded codecs and a real world vcf example could be really useful.
so I can wait for your implementation, no problem.