ga4gh-server
ga4gh-server copied to clipboard
Support remote FASTAs using local index
It appears the index files are optional in the current references implementation. We ought to explicitly pass the index path using filepath_index
so that repositories can be read only directories, or so the index and fasta don't have to be in the same directory. A column would need to be added to the ReferenceSet
table.
I agree this would be good @david4096. I tried to implement it while I was doing the rest of the updates, but it didn't seem to work very well with pysam. I didn't look into it too deeply, but it seemed to me like pysam wasn't passing the filepath_index
down to htslib properly. It's also complicated by the fact that we seem to have two index files for bgzipped FASTAs, which I don't follow. We should investigate this and open a bug upstream if it really is a problem.
Confirmed your experience, feel free to share your thoughts :)
https://github.com/pysam-developers/pysam/issues/270
Excellent, thanks @david4096.
Should we remove the option to allow optional indexes until pysam solves this?
I don't think we have that option now, do we? There is no indexFile option to add-referenceset
. We do need to do some work on input validation to see if indexes exist and if the input is bgzipped and so on, but this all gets a bit tricky with htslibs eagerness to 'fix' problems rather than throw errors back to the user.