ga4gh-server icon indicating copy to clipboard operation
ga4gh-server copied to clipboard

Support remote FASTAs using local index

Open david4096 opened this issue 8 years ago • 5 comments

It appears the index files are optional in the current references implementation. We ought to explicitly pass the index path using filepath_index so that repositories can be read only directories, or so the index and fasta don't have to be in the same directory. A column would need to be added to the ReferenceSet table.

david4096 avatar Apr 27 '16 21:04 david4096

I agree this would be good @david4096. I tried to implement it while I was doing the rest of the updates, but it didn't seem to work very well with pysam. I didn't look into it too deeply, but it seemed to me like pysam wasn't passing the filepath_index down to htslib properly. It's also complicated by the fact that we seem to have two index files for bgzipped FASTAs, which I don't follow. We should investigate this and open a bug upstream if it really is a problem.

jeromekelleher avatar Apr 28 '16 08:04 jeromekelleher

Confirmed your experience, feel free to share your thoughts :)

https://github.com/pysam-developers/pysam/issues/270

david4096 avatar Apr 28 '16 18:04 david4096

Excellent, thanks @david4096.

jeromekelleher avatar May 03 '16 07:05 jeromekelleher

Should we remove the option to allow optional indexes until pysam solves this?

david4096 avatar May 16 '16 20:05 david4096

I don't think we have that option now, do we? There is no indexFile option to add-referenceset. We do need to do some work on input validation to see if indexes exist and if the input is bgzipped and so on, but this all gets a bit tricky with htslibs eagerness to 'fix' problems rather than throw errors back to the user.

jeromekelleher avatar May 17 '16 07:05 jeromekelleher