pypsa-earth
pypsa-earth copied to clipboard
retrieve databundle depends on build cutout settings
Checklist
- [x] I am using the current
main
branch or the latest release. Please indicate. - [x] I am running on an up-to-date
pypsa-earth
environment. Update viaconda env update -f envs/environment.yaml
.
Describe the Bug
When rule retrieve_databundle_light
is executed, while build_cutout
is set to False
, it tries to download the file cutouts/cutout-2013-era5
which eventually fails.
I'm not sure if this is intentional, but it is very annoying and hard to spot. Build_cutout
at this stage is not even executed, and the cutout is not needed.
Maybe I'm misinterpreting some intentional behavior here, but I'm sure there is a bug somewhere because retrieve_databundle_light
should execute regardless of the build_cutout
settings, in my opinion.
Error Message
MissingOutputException in rule retrieve_databundle_light in file */pypsa-earth/Snakefile, line 147:
Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
cutouts/cutout-2013-era5.nc
Not sure if this links to #812
Hello! Thanks for posting! What country were you testing? May you have the complete log of the log? Sometimes for regions outside Africa, Google drive, the only source of those files, limits the number of downloads and may cause that issue
I suppose I have the same issue for Türkiye. The retrieve databundle fetches sandbox links. However, it does not even download cutout bundles which has google drive links. It directly gives below error. I can download cutout bundles manually therefore number of download limit should not be the reason for the error I believe. I am also not sure if it is connected to the build cutout setting in the config file.
Thanks a lot for reporting, @martacki and @Emre-Yorat89!
I can reproduce the issue for Türkiye (@Emre-Yorat89 thank you so much for providing the detailed analysis of the issue!). The problem is in fact linked with loading from google drive and caused by the fact that gdd.download_file_from_google_drive()
returns an empty zip file which leads to further troubles when trying to unzip it.
Not sure if it is connected with a daily quota, as in this case we should have 403
error, according to google documentation. Can it be probably the case that google has changed the behaviour but not updated the docs? 🤔
As for the effect of build_cutout
, setting build_cutout: true
by-passes loading the cutout, which is currently the only data type loaded from google drive instead of zenodo.
As a temporal fix it can be suggested to load the cutout manually using urls specified in configs/bundle_config.yaml
Hello,
I have made a couple of simple experiments with the googledrivedownloader package with the below code. When I first tried it the downloaded was a corrupt zip file. After changing the sharing option from "Restricted" to "Anyone with the link" on google drive solved the issue. Hopefully this is also the case for our problem.
Hello, I have made a couple of simple experiments with the googledrivedownloader package with the below code. When I first tried it the downloaded was a corrupt zip file. After changing the sharing option from "Restricted" to "Anyone with the link" on google drive solved the issue. Hopefully this is also the case for our problem.
Thanks for testing @Emre-Yorat89! Have checked "General access"
options for bundle_cutouts_northamerica
and bundle_cutouts_asia
, and it looks like sharing by link is on: Anyone with link
corresponds to Viewer
rights. Which should also allow to download file... Although, I feel that your idea leads to a right direction.
Update after some additional testing: the reason of the troubles seems to be in fact a number of downloads. While an initial request to gdisk returns status 200
(== everything is fine), an authorised request
https://github.com/ndrplz/google-drive-downloader/blob/be1aba9e2e43b2375475f19d8214ca50a8621bd6/google_drive_downloader/google_drive_downloader.py#L58-L61
returns 429
which means exactly too many requests
.
At the time being, a quick fix is to load a cutout file manually by the links provides in /configs/bundle_config.yaml
Would be probably nice to add a check of server status response and add a meaningful warning or error.
Hello @martacki! Thank you for reporting this issue. It has been investigated in more details by #866 and fixed by #911. So, it data retrival should work properly now. Do you have any additionally comments or can we count this issue as completed? 🙂
Closing this issue as the have been a number of the fixes, and the description here got outdated. Unfortunately, some troubles may still happen when downloading a cutout. The description of the problem, it's reason and a possible solution are given in #1078