earthaccess
earthaccess copied to clipboard
Downloads for AU_SI12_NRT_R04 incorrect
I am trying to download granules of AU_SI12_NRT_R04 using earthaccess.download but the results are incorrect. Files are created on disk but they do not seem to contain the data.
import earthaccess
results = earthaccess.search_data(short_name='AU_SI12_NRT_R04')
results = sorted(results, key=lambda x: x['meta']['revision-date'], reverse=True)
earthaccess.login()
files = earthaccess.download(results, "/tmp/test")
Granules found: 14
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 11/25/2023
Using environment variables for EDL
Getting 14 granules, approx download size: 0.0 GB
QUEUEING TASKS | : 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 2188.52it/s]
PROCESSING TASKS | : 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:02<00:00, 10.15it/s]
COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 160877.41it/s]
This results in files of ~4.1K in size in the indicated /tmp/test directory. I expect files ~126M in size:
$ ls -lah /tmp/test/
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_P04_20230926.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230913.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230914.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230915.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230916.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230917.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230918.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230919.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230920.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230921.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230922.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230923.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230924.he5
-rw-rw-r-- 1 trst2284 trst2284 4.1K Sep 26 13:15 AMSR_U2_L3_SeaIce12km_R04_20230925.he5
I haven't dug into this very deeply yet, but I found the code in earthaccess.store that is responsible for downloading files and set a breakpoint here:
(Pdb) url
'https://lance.nsstc.nasa.gov/amsr2-science/data/level3/seaice12/R04/hdfeos5/AMSR_U2_L3_SeaIce12km_R04_20230923.he5'
(Pdb) pp r.raw
<urllib3.response.HTTPResponse object at 0x7fd661351120>
(Pdb) pp r.content
(Pdb) (b'<!DOCTYPE html>\n<!--[if lt IE 7]><html class="no-js lt-ie9 lt-ie8 lt-ie7'
b'"> <![endif]-->\n<!--[if IE 7]><html class="no-js lt-ie9 lt-ie8"> <![endi'
b'f]-->\n<!--[if IE 8]><html class="no-js lt-ie9"> <![endif]-->\n<!--[if gt '
b'IE 8]><!--><html lang="en" class="no-js"><!--<![endif]-->\n <head>\n <'
b'meta charset="utf-8">\n <meta http-equiv="X-UA-Compatible" content="IE'
b'=edge,chrome=1">\n <title>Earthdata Login</title>\n <meta name="desc'
b'ription" content="Earthdata Login">\n <meta name="viewport" content="w'
b'idth=device-width, initial-scale=1.0">\n\n <!-- Google Tag Manager -->\n'
b" <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push(\n\n {'gtm.s"
b"tart': new Date().getTime(),event:'gtm.js'}\n\n );var f=d.getElementsBy"
b"TagName(s)[0],\n j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j"
b".async=true;j.src=\n 'https://www.googletagmanager.com/gtm.js?id='+i"
b"+dl;f.parentNode.insertBefore(j,f);\n })(window,document,'script','dat"
b"aLayer','GTM-WNP7MLF');</script>\n <!-- End Google Tag Manager -->\n\n "
b' <link href="https://cdn.earthdata.nasa.gov/eui/1.1.3/stylesheets/applicati'
b'on.css" rel="stylesheet" />\n <link rel="stylesheet" href="/assets/app'
b'lication-432b3917d4a41042c0fd963eba859548ef2993f5ed7a0dca4bdb446fdf807556.cs'
b's" media="all" />\n <!--[if IE 7]>\n <link rel="stylesheet" href="'
b'/assets/font-awesome-ie7.min.css">\n <![endif]-->\n <link href="//ne'
b'tdna.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css" rel="styl'
b'esheet">\n <link href=\'https://fonts.googleapis.com/css?family=Source+'
b'Sans+Pro:300,700\' rel=\'stylesheet\' type=\'text/css\'>\n <meta name="'
b'csrf-param" content="authenticity_token" />\n<meta name="csrf-token" cont'
b'ent="n8PZFKi4E7nfgviamfqUIY_0XanXV1LhoF9ZAbNt1-MUe2mYXetV1EKdM1sGW6RZFUZHiDY'
b'PycYyLZO6XSE6tg" />\n \n\n <!-- Grid background: http://subtlepattern'
b's.com/graphy/ -->\n </head>\n <body class="oauth authorize" data-turboli'
b'nks-eval=false>\n\n <!-- Google Tag Manager (noscript) -->\n <noscrip'
b't>\n <iframe src="https://www.googletagmanager.com/ns.html?id=GTM-WN'
b'P7MLF"\n height="0" width="0" style="display:none;visi'
b'bility:hidden"></iframe>\n </noscript>\n <!-- End Google Tag Manager'
b' (noscript) -->\n\n <header id="earthdata-tophat2" style="height: 32px;'
b'"></header>\n <!--[if lt IE 7]>\n <p class="chromeframe">You are u'
b'sing an <strong>outdated</strong> browser. Please <a href="http://browsehapp'
b'y.com/">upgrade your browser</a> or <a href="http://www.google.com/chromefra'
b'me/?redirect=true">activate Google Chrome Frame</a> to improve your experien'
b'ce.</p>\n <![endif]-->\n <div class="container">\n <header role='
b'"banner">\n <div id="masthead-logo">\n <h1><a class="ir" href="/">Eart'
b'hdata Login</a></h1>\n <span class="eui-badge badge daac">Earthdata Lo'
b'gin</span>\n </div>\n <a id="hamburger" href="#"><img title="Mobile Menu'
b'" alt="Three horizontal lines stacked" src="/assets/hamburger-68c8505066427f'
b'3e3f6ee40b24cfd3c9f7c0fe93ee298b9046564637262115fa.png" /></a>\n <nav ro'
b'le="navigation" class="masthead">\n\n <div id="hide">\n <ul>\n '
b' <li><strong><a href="/documentation">Documentation</a></strong></li>\n '
b' </ul>\n </div>\n </nav>\n</header>\n\n \n\n\n\n\n\n\n\n '
b' <section id="callout-login">\n <div class="client-login">\n <img cla'
b'ss="client-image" border="1" src="/app_image_image/19071" />\n <br>\n '
b' <h3 class="client-description">\n \n </h3>\n\n </div>\n <form'
b' id="login" action="/login" accept-charset="UTF-8" method="post"><input name'
b'="utf8" type="hidden" value="✓" autocomplete="off" /><input type="hid'
b'den" name="authenticity_token" value="A4tAYbi8xu0b3pfSWilc6te24QugNelEl0w-xr'
b'qa-1uIM_DtTe-AgIbBXBPFiGySTQT7KkFtcmMFPvR9VNYWDg" autocomplete="off" />\n'
b' <p><label for="username">Username</label><i class="fa fa-question-circle f'
b'a-question-circle--blue user-name" title="Login using either your Username o'
b'r Email Address"></i><input type="text" name="username" id="username" autofo'
b'cus="autofocus" class="default" /></p>\n <p><label for="password">Passwo'
b'rd</label><br /><input type="password" name="password" id="password" autocom'
b'plete="off" /></p>\n\n <p><input type="hidden" name="client_id" id="clien'
b't_id" value="mACp-6quKkkPZ3FiVl2Rng" autocomplete="off" /></p>\n <p><inp'
b'ut type="hidden" name="redirect_uri" id="redirect_uri" value="https://lance.'
b'itsc.uah.edu/urs-redirect" autocomplete="off" /></p> <p><input type="hi'
b'dden" name="response_type" id="response_type" value="code" autocomplete="off'
b'" /></p>\n <p><input type="hidden" name="state" id="state" value="aH'
b'R0cHM6Ly9sYW5jZS5pdHNjLnVhaC5lZHUvYW1zcjItc2NpZW5jZS9kYXRhL2xldmVsMy9zZWFpY2'
b'UxMi9SMDQvaGRmZW9zNS9BTVNSX1UyX0wzX1NlYUljZTEya21fUjA0XzIwMjMwOTIzLmhlNQ" au'
b'tocomplete="off" /></p>\n <p><input type="checkbox" name="stay_in" i'
b'd="stay_in" value="1" checked="checked" /> <label for="stay_in">Stay signed '
b'in (this is a private workstation)</label></p>\n\n <p class="button-with-'
b'notes">\n <input type="submit" name="commit" value="Log in" class="eui'
b'-btn--round eui-btn--green" data-disable-with="Log in" />\n <a class="'
b'eui-btn--round eui-btn--blue" href="/users/new?client_id=mACp-6quKkkPZ3FiVl2'
b'Rng&redirect_uri=https%3A%2F%2Flance.itsc.uah.edu%2Furs-redirect&res'
b'ponse_type=code&state=aHR0cHM6Ly9sYW5jZS5pdHNjLnVhaC5lZHUvYW1zcjItc2NpZW'
b'5jZS9kYXRhL2xldmVsMy9zZWFpY2UxMi9SMDQvaGRmZW9zNS9BTVNSX1UyX0wzX1NlYUljZTEya2'
b'1fUjA0XzIwMjMwOTIzLmhlNQ">Register</a>\n </p>\n <p class="form-instructi'
b'ons">\n <em class="icon-question-sign"></em>\n <a class="" href="/re'
b'trieve_info">I don’t remember my username</a>\n <br /><em class='
b'"icon-question-sign"></em>\n <a class="" href="/reset_passwords/new">I'
b' don’t remember my password</a>\n <br />\n <em class="icon-que'
b'stion-sign"></em>\n <a href="javascript:feedback.showForm();" title = '
b"'Need Help? Click on the Feedback button to request help'>Help</a>\n </p"
b'>\n</form>\n<aside class="govt-msg">\n <div class="nasa-logo"></div>\n <p>'
b'<strong>Why must I register?</strong></p>\n <p>\n The Earthdata Login '
b'provides a single mechanism for user registration and profile management for'
b' all EOSDIS system components (DAACs, Tools, Services).\n Your Earthda'
b'ta login also helps the EOSDIS program better understand the usage of EOSDIS'
b' services to improve user experience through customization of tools and impr'
b'ovement of services.\n EOSDIS data are openly available to all and fre'
b'e of charge except where governed by international agreements.\n </p>\n</'
b'aside>\n\n</section>\n<section id="cta">\n <h3>Get single sign-on access to'
b' all your favorite EOSDIS sites</h3>\n <a class="eui-btn--round eui-'
b'btn--blue" href="/users/new?client_id=mACp-6quKkkPZ3FiVl2Rng&redirect_ur'
b'i=https%3A%2F%2Flance.itsc.uah.edu%2Furs-redirect&response_type=code&'
b';state=aHR0cHM6Ly9sYW5jZS5pdHNjLnVhaC5lZHUvYW1zcjItc2NpZW5jZS9kYXRhL2xldmVsM'
b'y9zZWFpY2UxMi9SMDQvaGRmZW9zNS9BTVNSX1UyX0wzX1NlYUljZTEya21fUjA0XzIwMjMwOTIzL'
b'mhlNQ">Register for a Profile</a>\n</section>\n<div class="govt-warning eu'
b'i-info-box">\n <div class="warning-desktop">\n <p>\n <strong>\n '
b' Protection and maintenance of user profile information is described '
b'in\n <a href="https://www.nasa.gov/about/highlights/HP_Privacy.htm'
b'l">NASA\'s Web Privacy Policy.</a>\n </strong> \n </p>\n </di'
b'v>\n <div class="warning-mobile">\n <p>\n <strong>\n Protect'
b'ion and maintenance of user profile information is described in\n '
b' <a href="https://www.nasa.gov/about/highlights/HP_Privacy.html">NASA'
b"'s Web Privacy Policy.</a>\n </strong> \n </p>\n </div>\n <div cla"
b'ss="warning-mobile-mini">\n <strong>\n US Govt Property. Unauthori'
b'zed use subject to prosecution. Use subject to monitoring per\n <a h'
b'ref="https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPD&c=2810&s=1E">NPD2810<'
b'/a>.\n </strong>\n </div>\n</div>\n\n\n </div>\n <footer role="co'
b'ntentinfo">\n <h3>For questions regarding the EOSDIS Earthdata Login, pl'
b'ease contact <a href="javascript:feedback.showForm();" title="Earthdata Supp'
b'ort form">Earthdata Support</a></h3>\n <ul>\n <li class="version badge'
b' eui-badge--md">V 4.180.0\n</li>\n <li><a href="/">Home</a></li>\n <l'
b'i><a href="/users/new">Register</a></li>\n <li><a title="NASA Home" hr'
b'ef="http://www.nasa.gov">NASA</a></li>\n </ul>\n <p>NASA Official: Steph'
b'en Berrick</p>\n</footer>\n\n <script src="/assets/application-26ef2d894'
b'36774b62209186400ab34914d3661de4b009da594e25783d8575bad.js"></script>\n '
b' <script type="text/javascript">\n $(window).scroll(function(e){\n '
b' parallax();\n });\n function parallax(){\n var scrolled = $(wi'
b"ndow).scrollTop();\n $('#content').css('background-position', 'right"
b' \' + -(scrolled*0.25)+\'px \');\n }\n </script>\n <script src="h'
b'ttps://cdn.earthdata.nasa.gov/tophat2/tophat2.js" id="earthdata-tophat-scrip'
b't" data-show-fbm="true" data-show-status="true" data-status-api-url="https:/'
b'/status.earthdata.nasa.gov/api/v1/notifications"></script>\n <script t'
b'ype="text/javascript" src="https://fbm.earthdata.nasa.gov/for/URS4/feedback.'
b'js"></script>\n <script type="text/javascript">\n feedback.init();'
b'\n </script>\n <script type="text/javascript">\n setTimeout(fu'
b'nction()\n {var a=document.createElement("script"); var b='
b'document.getElementsByTagName("script")[0];\n a.src=do'
b'cument.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0013/'
b'2090.js?"+Math.floor(new Date().getTime()/3600000);\n '
b'a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}\n '
b' , 1);\n </script>\n\n <!-- BEGIN: DAP Google Analytics '
b' -->\n <script language="javascript" id="_fed_an_ua_tag" src="https://'
b'dap.digitalgov.gov/Universal-Federated-Analytics-Min.js?agency=NASA&subagenc'
b'y=GSFC&dclink=true"></script>\n <!-- END: DAP Google Analytics -->\n\n '
b' \n </body>\n</html>\n')
Looks like I'm getting the HTML response for EDL login, maybe I'm not doing something right with auth?
I think there's an issue with the auth endpoint for these granules? After clicking a data link in my browser and being redirected to EDL, I entered my credentials, and then was redirected to https://lance.nsstc.nasa.gov/urs-redirect which gave 403. After doing that, I'm able to go back to the CMR search results and click the data links and see the files.
After logging in once, I get a message like "so and so has been added to your authorized EDL applications". Can you try logging in as the account in question and then clicking the data links in your browser? I hope once the authorization step is done you may have different results.
I have manually downloaded the files with the earthdata account I'm using to authenticate with earthaccess. The results are the same from earthaccess's side.
I'm able to download the granules with some code adapted from qgreenland:
import os
import earthaccess
import requests
_URS_COOKIE = "urs_user_already_logged"
_CHUNK_SIZE = 8 * 1024
def _get_earthdata_creds():
if not os.environ.get("EARTHDATA_USERNAME"):
raise RuntimeError("Environment variable EARTHDATA_USERNAME must be defined.")
if not os.environ.get("EARTHDATA_PASSWORD"):
raise RuntimeError("Environment variable EARTHDATA_PASSWORD must be defined.")
return (
os.environ["EARTHDATA_USERNAME"],
os.environ["EARTHDATA_PASSWORD"],
)
def _create_earthdata_authenticated_session(s=None, *, hosts: list[str], verify):
if not s:
s = requests.session()
for host in hosts:
resp = s.get(
host,
# We only want to inspect the redirect, not follow it yet:
allow_redirects=False,
# We don't want to accidentally fetch any data:
stream=True,
verify=verify,
)
# Copy the headers so they can be used case-insensitively after the
# response is closed.
headers = {k.lower(): v for k, v in resp.headers.items()}
resp.close()
redirected = resp.status_code == 302
redirected_to_urs = (
redirected and "urs.earthdata.nasa.gov" in headers["location"]
)
if not (redirected_to_urs):
print(f"Host {host} did not redirect to URS -- continuing without auth.")
return s
auth_resp = s.get(
headers["location"],
# Don't download data!
stream=True,
auth=_get_earthdata_creds(),
)
resp.close()
if not (auth_resp.ok and s.cookies.get(_URS_COOKIE) == "yes"):
msg = f"Authentication with Earthdata Login failed with:\n{auth_resp.text}"
raise RuntimeError(msg)
print(f"Authenticated for {host} with Earthdata Login.")
return s
def _download_lance_files():
results = earthaccess.search_data(short_name="AU_SI12_NRT_R04")
for granule in results:
# There are two links for each granule. one for lance.nsstc.nasa.gov and
# the other for lance.itsc.uah.edu. The first one is fine.
url = granule.data_links(access="external")[0]
session = _create_earthdata_authenticated_session(hosts=[url], verify=True)
with session.get(
url,
timeout=60,
stream=True,
headers={"User-Agent": "NSIDC-dev-trst2284"},
) as resp:
# e.g., https://lance.nsstc.nasa.gov/.../AMSR_U2_L3_SeaIce12km_P04_20230926.he5
# -> AMSR_U2_L3_SeaIce12km_P04_20230926.he5
fn = url.split("/")[-1]
with open(f"/tmp/test/{fn}", "wb") as f:
for chunk in resp.iter_content(chunk_size=_CHUNK_SIZE):
f.write(chunk)
print(f"wrote {fn}")
if __name__ == "__main__":
_download_lance_files()