Cr2met v2 - Chilean precipitation and temperature data
cr2met v2 is a daily gridded dataset of precipitation and temperature covering Chile from 1979-2020.
Hi all,
This recipe is currently failing. Each input is a download link for a zipped file. Each zipped file contains a single netcdf file. ex: FILE
In pangeo-forge/staged-recipes#90, it's mentioned that fsspec can handle zipped files.
Is there a way I could help out in adding zipped file input functionality in pangeo-forge-recipes?
Thanks!
-Raphael
AFAICT, we don't need any special functionality. It should already work. You need to have your file_pattern function return URLs that look like this - zip://datapub/model_data/pism1.0_paleo06_6255/snapshots_-10000.000.nc::https://hs.pangaea.de/model/PISM/Albrecht-etal_2019/parameter-ensemble/Part2_pism_paleo_ensemble_v2.zip, or more generally - zip://path/within/zipfile/to/datafile.nc::https://url/to/zipfile/on/the/internet.
Fsspec should be able to open them directly.
You could try this now with this recipe.
😆 Ryan beat me to it, but just because I'd already typed this up, I believe your format function would just need to look something like this:
def make_filename(time, variable):
http_base = "https://www.cr2.cl/download/cr2met_v2-0"
if variable == "pr":
fname = f"zip://<NETCDF_FILENAME>.nc::{http_base}_pr_day_1979_2020/?wpdmdl=28866&ind=XEBr5cgJaEiPZ1OFhMQLzsGEQ2nIIGyaUXHZlZqrWmCl8TFr4qSxI2eBWBXizHHeDZtpy7gsohOq0wPs20kBmsAZNDWjlaaT4SVwXpop6zGrOAOfHBGIo2U59eNpOjT7AuxDSkAuTBTvDIrDXFvDzg" # noqa: E501
elif variable == "t2m":
fname = f"zip://<NETCDF_FILENAME>.nc::{http_base}_t2m_day_1979_2020/?wpdmdl=28864&ind=l5tlDuy3dq_CWqbJ-K3jvoxzm77YdYE7Nph_YyQ0A6j_scgZ-kaugoW2ox85O5hyrpnL0_OOwxFWr7LoODrNgB_F4Jg-7qaXpu_lWox8b9H6w6d7DrY_YJyRRzuU7SVMyCLKCJk-cxCZkcSalzRSPw" # noqa: E501
elif variable == "tmax":
fname = f"zip://<NETCDF_FILENAME>.nc::{http_base}_tmax_day_1979_2020/?wpdmdl=28862&ind=FH-qiDSW-IDlWVbL97QIBIH9pJNC2zf1377t4DNb7arzaTLv8shTVbXZQf9RUmZtYRSpuadjVcWU9MppWLSDWDOvqoESBnsOzcB31o-14ETxSUppjjXDeqfawstEbah8fcIsg7Sj22RSpFvsVbwOzQ" # noqa: E501
elif variable == "tmin":
fname = f"zip://<NETCDF_FILENAME>.nc::{http_base}_tmin_day-_1979_2020/?wpdmdl=28859&ind=TY0Apx4oPcU_XU_P4Tez5FMHTXgdcgQbyukVXiBT-0Sm9JsVwkTR7bS72tdh96ffXB2viQq8-sYBORa3OucO7dtGbckdXr5Dh-2O6ISVCW4NsKOBwSRv3h-wGW0aaSwJKpPTY6UXP9VdNk-y2_V8GQ" # noqa: E501
return fname
Ah thanks so much for the quick feedback! I'll update the recipe and test it out.
Hey there,
Had some issues with the zipped files. It seems like the cr2met API might be restricting their downloads to browsers.
For example, this works:
curl -O -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36" https://www.cr2.cl/download/cr2met_v2-0_tmax_month_1979_2019/\?wpdmdl\=28861
but this does not:
curl -O https://www.cr2.cl/download/cr2met_v2-0_tmax_month_1979_2019/\?wpdmdl\=28861
Wondering if @cisaacstern or @rabernat have any suggestions on inserting headers into fsspec calls. Since there are only ~4 files, one option we were thinking of is downloading them and transferring them to blob storage on Azure, then pointing the recipe to that location.
Thanks!
@norlandrhagen, pangeo-forge-recipes will pass fsspec_open_kwargs supplied to the FilePattern as kwargs to fsspec.open. fsspec.open will, in turn, pass kwargs to the filesystem instance created to access the path supplied to it. For a path beginning with http://, that will be an HTTPFileSystem. As described in the fsspec docs here, this class accepts a client_kwargs keyword, for passing arguments to the next layer down, which is an aiohttp.ClientSession: and, finally, that object accepts a headers kwarg as documented here.
I'm glad the user-agent stuff has come up.
I think we should actually be setting user-agent to something Pangeo Forge specific to help data providers know when we are crawling their data. (Best practices.)
If a provider is blocking certain user agents, we should probably not work around that by spoofing the user agent field. Instead, we should work with the data provider to understand their policy and (perhaps) get an exception granted for our user agent.
I think we should actually be setting user-agent to something Pangeo Forge specific
Would this be implemented as a default key:value pair added to fsspec_open_kwargs whenever source file paths begin with http:// or https://?
Just sent an email to the data provider to see what they think of an exception for our use case. I'll update when I hear back.
@cam-gerlach, see above for prior discussion of zip file path formatting.