htslib
htslib copied to clipboard
Access token expires when accessing Google Cloud Storage (GS) objects
I’m currently using the GCS_OAUTH_TOKEN environment variable method to provide an OAuth access token to samtools in order to access GCS stored objects (see #390). Obtaining an access token is fairly easy on a Google compute VM with the command “export GCS_OAUTH_TOKEN=$(gcloud auth application-default print-access-token)”. However the application-default access token that is returned expires after 3600 seconds and therefore any long running program/script that attempts to invoke samtools after the expiration period is understandably denied access. As a workaround it’s possible to keep track of the expiration time and reissue a request for a new token prior to invoking samtools – but this is a serious inconvenience and not always possible.
I’m left wondering why htslib doesn’t just try a request the access token itself should no other means of authentication be provided? As I understand it the metadata URL for the token is http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
and you need to pass a header of "Metadata-Flavor: Google".
Example:
joe@test:/tmp$ export GCS_OAUTH_TOKEN=$(gcloud auth application-default print-access-token); echo $GCS_OAUTH_TOKEN
ya29.c.XXXXXXXXXXXXXXXXXXXXX
joe@test:/tmp$ curl -s -S https://www.googleapis.com/oauth2/v1/tokeninfo?access_token="$GCS_OAUTH_TOKEN" | jq '.expires_in'
2442
joe@test:/tmp$ curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
{"access_token":"ya29.c.XXXXXXXXXXXXXXXXXXXXX","expires_in":2437,"token_type":"Bearer"}
And there already seems to be code inhfile_libcurl.c that handles expiring tokens that could also be used, or at least work as a model. Then I stumbled across the Add Bearer token support to hfile_libcurl (for htsget) #600 pull, which looks to be where the code came from and does exactly what I think I was just suggesting by setting a HTS_AUTH_LOCATION token. Unfortunately though I wasn’t able to get it to work for GCS -- is this functionality only for htsget ?
HTS_AUTH_LOCATION
was created for use with htsget, but it may work with GCS as long as you don't set GCS_OAUTH_TOKEN
. The best way to test this would be to use htsfile
which allows you to crank up the verbosity enough to see the https transaction. Based on your curl command-line above, a proof-of-concept might be something like this:
mkfifo /tmp/token_fifo
( while true ; do curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token > /tmp/token_fifo ; done ) &
HTS_AUTH_LOCATION=/tmp/token_fifo ./htsfile -vvvvvvvv -c gs://my_bucket/my_file | head
If that works then it should be possible to try something similar on a longer-running process.
Confirmed that this does indeed work with htsfile and samtools.
Good to hear this. I'll leave a note here that we need to document this as a better way of supplying the token when using GCS.