bam-js icon indicating copy to clipboard operation
bam-js copied to clipboard

Htsget unable to fetch header from some endpoints

Open cmdcolin opened this issue 3 years ago • 7 comments

Ref ga4gh endpoint here

https://github.com/samtools/hts-specs/pull/530

Rejects the bogus refname provided that we use currently

Have to adjust to not specify any refname, but then it returns very large data chunks, and we have to range-request the results of what it gives back

cmdcolin avatar Nov 09 '20 02:11 cmdcolin

I suspect that's partly being tackled by @jb-adams in https://github.com/ga4gh/htsget-refserver/pull/8 ?

brainstorm avatar Nov 09 '20 21:11 brainstorm

Could be! On some level, I think this code should figure out how to be more like samtools and figure it out but I'll certainly check the pr especially if it is deployed somewhere

cmdcolin avatar Nov 09 '20 22:11 cmdcolin

For dnanexus's webserver, we request a bogus refname because otherwise it says the "header" involves a download of 10GB of data, and we don't try to "subselect" the range that it gives us

We could consider dropping support for dnanexus's htsget server so that ga4gh's htsget server works, or we find a fix that accomodates both, or just leave as is

cmdcolin avatar Jul 07 '21 14:07 cmdcolin

See the behavior of the dnanexus server here

#range is the entire file, e.g. 140gb, which our code doesn't currently try to subselect from resulting in bad behavior if used
http://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37/NA12878?class=header

#reasonable size, all data encoded in a data uri even
http://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37/NA12878?class=header&referenceName=DOES_NOT_EXIST

cmdcolin avatar Jul 07 '21 14:07 cmdcolin

IIRC the htsnexus htsget server might not as up to date as the GA4GH reference htsget server? Please refer to the official public GA4GH server endpoints mentioned in here:

https://github.com/igvteam/igv.js/issues/1187#issuecomment-858314458

So yes, I'd consider dropping support for previous spec versions, tbh.

/cc @mlin @ohofmann

brainstorm avatar Jul 08 '21 01:07 brainstorm

Ya that was the impetus for the comment. However, my workaround to work with the dnanexus server (to add a random referenceName to the class=header request) does not work with the ga4gh server. I kind of figured the hacky behavior to add the random refname wouldn't be great but I got to figure out what to do next

cmdcolin avatar Jul 08 '21 13:07 cmdcolin

That page is now gone along with the deprecated endpoints. I was about to suggest using the official GA4GH htsget endpoint, but it seems to be undergoing some issues for a couple of weeks now?:

Screen Shot 2022-01-19 at 3 19 01 pm

/cc @jb-adams can you tilt that one back up please? /cc @victorskl @andrewpatto

brainstorm avatar Jan 19 '22 04:01 brainstorm