earthaccess
earthaccess copied to clipboard
Option to use UAT environment
Challenge
earthaccess
does not currently support access to collections that are in NASA Earthdata's User Acceptance Testing (UAT) environment. Instead, the production (PROD) environment is assumed when logging in via earthaccess
and there is no option to change the environment.
Desired functionality
When executing the earthaccess.login()
method, an option to specify which environment should be available. And the authentication should be correctly applied to the specified environment.
Example
I encountered this challenge when trying to test data transformation code with a data collection for the Tropospheric Emissions Monitoring of POllution (TEMPO) instrument that is in UAT. I would like to search and download granules from this collection using earthaccess
, but that is currently not possible.
Benefit
If the UAT environment can be accessed with earthaccess
, its usability would be expanded to more dataset testing.
Should we support manually overriding the EDL base URL to anything? That way we can support other existing environments like System Integration Testing (SIT), and any others that may be created in the future.
Supporting SIT would be great too! It might be more secure/simpler to just allow UAT/SIT/PROD for now?
@mfisher87 , @betolink,
Currently, one of the options for .login()
's strategy=
argument is "environment".
Because of that, it seems to me like it might be confusing to add a new keyword argument to .login()
named environment=
, which would result in function calls looking something like this:
earthdata.login(strategy="environment", environment=Env.UAT)
Using "environment" twice where it means two different things looks strange, right?
Would "endpoint" perhaps be a better name for the new keyword argument? Other suggestions?
"maturity" is generally how I think about UAT/SIT/PROD, but "cmr_endpoint" is a good option as well (just endpoint makes me go "to what?")
nice, I like "cmr_endpoint" too
I've started working on this issue but could use some more expert eyes to ensure an appropriate token is available when using UAT. Specifically, my question is:
Will the client_id
parameter on this line — for the self.EDL_GET_PROFILE
variable — need to change when the base of the URL changes from "urs.earthdata.nasa.gov" to "uat.urs.earthdata.nasa.gov"
Yeah, we'll need different client IDs for each environment.
This comment from a few lines up is prescient:
# Maybe all these predefined URLs should be in a constants.py file
The URLs all repeat the hostname too, so maybe instead of
EDL_GET_TOKENS_URL = "https://urs.earthdata.nasa.gov/api/users/tokens"
We would rather store
EDL_GET_TOKENS_PATH = "/api/users/tokens"
and concatenate these values with the variable hostname.
EDIT: Thinking a bit about this, the variable defined just above is what I think of when I think "endpoint", so maybe that's not an ideal term for UAT/SIT/PROD... :shrug:
Thanks @mfisher87. Then it seems like two questions remain...
- How do we get/determine what the Client ID is for each environment? (
- What should the
.login()
function signature look like? Another option for the new argument name: "cmr_environment", as in:
earthdata.login(strategy="environment", cmr_environment=Env.UAT)
How does that look?
- Maybe we need a mapping from envs (keys) to hostnames and client IDs. These are both solely dependent on the EDL environment/maturity level, to the best of my knowledge.
- I'm really struggling with what I think would be a good parameter name for this. I'll keep thinking!
I'm realizing something after seeing the suggestion cmr_environment
. Changing these URLs is actually changing the Earthdata Login environment, not CMR environment. Should we be linking these two things together so when the user selects UAT, both CMR UAT and EDL UAT are used? Or should the user be able to change both independently? If the latter, maybe cmr_{environment,maturity}
& edl_{environment,maturity}
make sense. If the former, then we need to think about how to communicate that this parameter will change multiple APIs. I'm not opposed to being wordy, like cmr_and_edl_maturity
.
Yeah, I've been thinking of this argument as something that will change both EDL and CMR. Not just one or the other independently. Could be cmr_and_edl_env
?
I'd also be happy with cmr_and_edl_maturity
- Maybe we need a mapping from envs (keys) to hostnames and client IDs. These are both solely dependent on the EDL environment/maturity level, to the best of my knowledge.
I'm actually just not sure where to find the Client ID value to use for UAT.
Is a Client ID something I can find on some Earthdata page now — or, do we need to create a new Earthdata Application (in UAT, and then again in SIT)?
I don't know if any "application" (that term always throws me off when it refers to an integration) was created for UAT/SIT. @betolink I think would know.
Either way, while we're in there we should ensure all earthaccess maintainers are admins of all the EDL integrations :)
@betolink, do you know the answer to the above question about the Client ID?
Sorry for the delayed response @danielfromearth, I think we can ignore client_id
, this is only telling URS which app is requesting the user's info and this one is actually not even earthaccess', I think I took one of NSIDC's apps (yeah, we should perhaps create an earthaccess app in launchpad etc etc)
I like what you suggested
earthdata.login(cmr_environment=Env.UAT)
Or perhaps using a "EARTHACCESS_ENV=UAT"
(default being prod)
then earthaccess will do something similar to what's already doing:
- Look for environment variables, but this time would be
-
EARTHDATA_USERNAME_UAT
-
EARTHDATA_PASSWORD_UAT
-
- Look for a profile in the
.netrc
that points to the UAT EDLmachine urs.uat.earthdata.nasa.gov login `my_uat_user` `my_uat_password`
- Ask the user their credentials and authenticate using UAT EDL
Based on discussions with @betolink and @nikki-t, going to move forward with this signature for now:
earthdata.login(earthdata_env=Env.UAT)
After a bit of digging, I see that we require a client_id when we request a user's profile.
However, why are we including a function to get a user's profile?
I just submitted a question to EDL support regarding the need for a client_id because the documentation does not appear to align with the behavior:
If a user token is provided in the Authorization: Bearer header, then the client_id paremeter is required to identify the calling client application.
However, we are using an access token, not a user token, so the documentation implies to me that we should not need to supply client_id. Unfortunately, this is not the case. When not including client_id (a valid one), I get this error:
{'error': 'invalid_token', 'error_description': 'The required client id is missing from the request'}
So I see the following options for us:
- If EDL deems the error above to represent a bug (i.e., the doc is correct, and when using an access token, a client_id should not be required), wait for them to fix the issue. This is probably the least attractive option (and possibly the least likely, or certainly not timely).
- Register an earthaccess application in both OPS (PROD) and UAT in order to obtain a client_id for each system. BTW, I confirmed that the client_id we're using does not work for UAT, which makes sense.
- Eliminate the
Auth.get_user_profile
method. Again, why is this included in the first place? Do we really need this? Is anybody making use of it? If so, what for?
@chuckwondo, your question about why resonates with me — and the third option sounds reasonable, but I don't really know what the range of considerations would be.
(Should we move this client_id
question to a new issue, and also not wait on it for https://github.com/nsidc/earthaccess/pull/426?)
However, we are using an access token, not a user token, so the documentation implies to me that we should not need to supply client_id.
I see the problem. I incorrectly assumed we use access tokens because when I look at the auth token the key is "access_token"
. However, I see elsewhere in our code that we are actually obtaining user tokens, so the EDL API appears to be working as expected.
@chuckwondo, your question about why resonates with me — and the third option sounds reasonable, but I don't really know what the range of considerations would be.
(Should we move this
client_id
question to a new issue, and also not wait on it for #426?)
We cannot move the client_id question to a new issue because your PR won't work in UAT without a registered app in UAT so that we can obtain a client_id appropriate for UAT. The existing client_id works only in PROD.
We either create a UAT app to get a client_id (we should do the same for PROD too, but that might have other ramifications, which we might want to create a separate issue for), which then means we also need to move client_id to our new System class, OR we get rid of get_user_profile
altogether, along with eliminating client_id altogether.
I think @danielfromearth already tested this in UAT and seemed to work. I'll test with some NSIDC UAT-only datasets and report back. I think the EDL documentation states that it shouldn't work but it does. I'd say we leave it like this for now and we open a new issue to register earthaccess as a client on its own.
I think @danielfromearth already tested this in UAT and seemed to work. I'll test with some NSIDC UAT-only datasets and report back. I think the EDL documentation states that it shouldn't work but it does. I'd say we leave it like this for now and we open a new issue to register earthaccess as a client on its own.
No, get_user_profile
does not work for UAT because the client_id is for PROD. Attempting to do so produces the following error:
{'error': 'invalid_token', 'error_description': 'This application does not exist'}
Ah OK, I was talking about the search and access part not the get_user_profile
itself. Should we wait to merge this until we have valid client_id
for earthaccess or merge with "works except for get_user_profile
"?
We could, but that goes back to my other question. Why do we have get_user_profile
? Is anybody using it, and if so what for? I'd prefer simply to eliminate it because it is the only thing that uses client_id
.
If we can eliminate get_user_profile
, then that eliminates the need for any client_id
.
I thought we used it for the tokens but that's not the case, I feel we'll need it in the future but for now we can eliminate it.
and what about a release after this PR?
My only concern would be the possibility of this being a breaking change since Auth.get_user_profile
is part of the public API, so if some user of earthaccess
is using that method, then we would break their code.
However, I would be surprised if anybody uses that method, so I'm willing to risk pulling the method.
and what about a release after this PR?
Sure, sounds reasonable to me.
My only concern would be the possibility of this being a breaking change since
Auth.get_user_profile
is part of the public API, so if some user ofearthaccess
is using that method, then we would break their code.However, I would be surprised if anybody uses that method, so I'm willing to risk pulling the method.
@betolink, @mfisher87, @jhkennedy, any concern here with removing Auth.get_user_profile
method so we eliminate the need for client_id
?
no concerns, we should get a client_id
and in the future bring this back if needed for other API integrations.