gatk
gatk copied to clipboard
Azure URIs and GenomicsDB
@lbergelson, just want to discuss some issues here-
- We currently have to use
--avoid-nio
with--sample-name-map
and--bypass-feature-reader
to getGenomicsDBImport
to work with azure URIs. Why don't we just merge the--avoid-nio
functionality with--bypass-feauture-reader
, that is allow GenomicsDB to process the URIs by default? - Noticed that the only way to use azure URIs for vcf names is by using
--sample-name-map
. Directly specifying vcfs with the-V
option is not possible because--avoid-nio
cannot be used in conjunction. Should this be supported? - @lbergelson, w.r.t malformed Azure URIs, GenomicsDB does put out an error -
11:10:12.658 error NativeGenomicsDB - pid=30608 tid=2980282 htslib_plugin could not open file az://genomicsdb@oda/vcfs/t0.vcf.gz [TileDB::StorageManagerConfig] Error: Azure Storage Blob initialization failed for home=az://genomicsdb@container/vcfs/sample.vcf.gz; ; Azure Blob URI does not seem to have either an account or a container: Protocol error
[E::hts_open_format] Failed to open file "az://genomicsdb@container/vcfs/sample.vcf.gz" : Input/output error
Is this not sufficient? These are the acceptable azure URIs currently
az://<container_name>@<account_name>.blob/<folder>/<file> # for default endpoints
az://<container_name>@<account_name>.blob.core.windows.net/<folder>/<file> # if the endpoint is blob.core.windows.net
azb://<container_name>/<folder>/<file> # following java.nio for azure URIs
azb://<container_name>/<folder>/<file>?account=<account_name>&endpoint=<endpoint>
- @nalinigans It's a very reasonable question. It's true, the --avoid-nio flag is technically redundant. You can recreate it with a combination of other flags. I added it because a) I didn't realize that was the when I started adding it. b) The combination of flags was kind of complicated so it was helpful to have something that gave you clear instructions about what you needed to enable.
I think we could merge them, although I think there is one sanity check we do even when -bypass-feature-reader is turned on, that we need to turn off. I basically added "something that works for Megan's project right now."
-
Yes, the various cases were getting complicated and I had a bug when -V was enabled so I just disabled it as an option. It would make sense to add -V support for azure files. I just didn't do it because I was in a rush and I figured it was better to disable it than to have it potentially be wrong.
-
Yeah, that's the error I saw. It's definitely better than nothing. It would be great if it could be propagated back up to the java layer as a Java exception though. It currently ends the program with SIGABORT I think which doesn't play that nicely with various reporting and retry mechanisms. No super high priority, but nice if you have the cycles.