pypsa-earth icon indicating copy to clipboard operation
pypsa-earth copied to clipboard

retrieve databundle depends on build cutout settings

Open martacki opened this issue 1 year ago • 8 comments

Checklist

  • [x] I am using the current main branch or the latest release. Please indicate.
  • [x] I am running on an up-to-date pypsa-earth environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

When rule retrieve_databundle_light is executed, while build_cutout is set to False, it tries to download the file cutouts/cutout-2013-era5 which eventually fails. I'm not sure if this is intentional, but it is very annoying and hard to spot. Build_cutout at this stage is not even executed, and the cutout is not needed.

Maybe I'm misinterpreting some intentional behavior here, but I'm sure there is a bug somewhere because retrieve_databundle_light should execute regardless of the build_cutout settings, in my opinion.

Error Message

MissingOutputException in rule retrieve_databundle_light in file */pypsa-earth/Snakefile, line 147:
Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
cutouts/cutout-2013-era5.nc

martacki avatar Sep 05 '23 15:09 martacki

Not sure if this links to #812

martacki avatar Sep 05 '23 15:09 martacki

Hello! Thanks for posting! What country were you testing? May you have the complete log of the log? Sometimes for regions outside Africa, Google drive, the only source of those files, limits the number of downloads and may cause that issue

davide-f avatar Sep 05 '23 15:09 davide-f

I suppose I have the same issue for Türkiye. The retrieve databundle fetches sandbox links. However, it does not even download cutout bundles which has google drive links. It directly gives below error. I can download cutout bundles manually therefore number of download limit should not be the reason for the error I believe. I am also not sure if it is connected to the build cutout setting in the config file. bundles_to_be_downloaded_

Emre-Yorat89 avatar Sep 11 '23 14:09 Emre-Yorat89

Thanks a lot for reporting, @martacki and @Emre-Yorat89!

I can reproduce the issue for Türkiye (@Emre-Yorat89 thank you so much for providing the detailed analysis of the issue!). The problem is in fact linked with loading from google drive and caused by the fact that gdd.download_file_from_google_drive() returns an empty zip file which leads to further troubles when trying to unzip it.

Not sure if it is connected with a daily quota, as in this case we should have 403 error, according to google documentation. Can it be probably the case that google has changed the behaviour but not updated the docs? 🤔

As for the effect of build_cutout, setting build_cutout: true by-passes loading the cutout, which is currently the only data type loaded from google drive instead of zenodo.

As a temporal fix it can be suggested to load the cutout manually using urls specified in configs/bundle_config.yaml

ekatef avatar Sep 11 '23 18:09 ekatef

Hello, I have made a couple of simple experiments with the googledrivedownloader package with the below code. When I first tried it the downloaded was a corrupt zip file. After changing the sharing option from "Restricted" to "Anyone with the link" on google drive solved the issue. Hopefully this is also the case for our problem. gdd

Emre-Yorat89 avatar Sep 11 '23 20:09 Emre-Yorat89

Hello, I have made a couple of simple experiments with the googledrivedownloader package with the below code. When I first tried it the downloaded was a corrupt zip file. After changing the sharing option from "Restricted" to "Anyone with the link" on google drive solved the issue. Hopefully this is also the case for our problem. gdd

Thanks for testing @Emre-Yorat89! Have checked "General access" options for bundle_cutouts_northamerica and bundle_cutouts_asia, and it looks like sharing by link is on: Anyone with link corresponds to Viewer rights. Which should also allow to download file... Although, I feel that your idea leads to a right direction.

ekatef avatar Sep 11 '23 21:09 ekatef

Update after some additional testing: the reason of the troubles seems to be in fact a number of downloads. While an initial request to gdisk returns status 200 (== everything is fine), an authorised request

https://github.com/ndrplz/google-drive-downloader/blob/be1aba9e2e43b2375475f19d8214ca50a8621bd6/google_drive_downloader/google_drive_downloader.py#L58-L61

returns 429 which means exactly too many requests.

At the time being, a quick fix is to load a cutout file manually by the links provides in /configs/bundle_config.yaml

Would be probably nice to add a check of server status response and add a meaningful warning or error.

ekatef avatar Sep 13 '23 20:09 ekatef

Hello @martacki! Thank you for reporting this issue. It has been investigated in more details by #866 and fixed by #911. So, it data retrival should work properly now. Do you have any additionally comments or can we count this issue as completed? 🙂

ekatef avatar Jan 06 '24 12:01 ekatef

Closing this issue as the have been a number of the fixes, and the description here got outdated. Unfortunately, some troubles may still happen when downloading a cutout. The description of the problem, it's reason and a possible solution are given in #1078

ekatef avatar Aug 09 '24 16:08 ekatef