ffhq-dataset
ffhq-dataset copied to clipboard
Using pydrive with user credentials for authenticated download
Unfortunately, when using your code, an anonymous download is performed and I tried several consecutive days, I always got an exceeded quota error making me unable to download the dataset.
This pull requests, which uses code adapted from the FFHQ-Aging repo is using user credentials for downloading the dataset.
The only requirement is to follow the pydrive quickstart for getting the client_secrets.json
file placed in the same directory than download_ffhq.py
and you can then indicate you want to use pydrive google authentication by appending the --pydrive
command line option.
So for example, for downloading the 1024x1024 images, you simply :
python3 download_ffhq.py -i --pydrive
In the code, several attempts are tried to download a file. Without that code, inspired by yours, I got some httplib2.error.ServerNotFoundError: Unable to find the server at www.googleapis.com
being raised. Apparently, retrying the download a second time and the exception is not raised.
I only tested the download of the images (the command line above) but as the other downloads go through the download_files
function, I hope it works as well for the other downloads.
Note that, for some reasons, after some times (like hours), it may try to reauthenticate and it ends as a failure but relaunching the script and it continues downloading;
I successfully downloaded the 90 GB of the 1024x1024 images this way.
This was very helpful for me. I was able to download the 89GB of 1024x1024 images with a restart after a few hours. As an additional step, I had to replace
# Google Drive virus checker nag.
links = [html.unescape(link) for link in data_str.split('"') if 'export=download' in link]
if len(links) == 1:
if attempts_left:
file_url = requests.compat.urljoin(file_url, links[0])
continue
with
# Google Drive virus checker nag.
file_id = re.findall('uc\?id=(.*)&', data_str)
if len(file_id) == 1:
file_id = file_id[0]
if attempts_left:
file_url = 'https://www.googleapis.com/drive/v3/files/{}/?key=API_KEY&alt=media'.format(file_id)
continue
This is because the virus checker page changed, so the code for handling it doesn't work anymore. To make this work, I had to follow the instructions in the pydrive quickstart link given above (i.e., use this PR and get a client_secrets.json from the Drive API). The new virus checker workaround uses an API key that you can create in a GCP API project, similar to how you get the client_secrets.json file. You can also use the OAuth key.
I had to run the download script with the --cmd_auth flag and use a "Desktop" instead of "Web application" setting in the Drive API to make it work. Here is a screenshot of my Drive API page.