gdown
gdown copied to clipboard
New feature: Skip existed files when downloading folders
Background
Due to some accidents, some users encounter the problems that the download tasks break up unexpectedly. When downloading just a single file, it is acceptable to re-download it from the beginning. However, when downloading a folder, it is a waste of time to re-download the files downloaded before the break.
Update:
- Add a new parameter
skip_existed
to the functiondownload_folder()
. - Print the skipped files if
quite
is set to False.
Example Python Snippet:
import gdown
import os
output_dir = "./xxx/your_target_dir"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
url = "https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxx=sharing"
gdown.download_folder(url, output=output_dir, remaining_ok=True, quite=False, skip_existed=True)
Thanks! This feature is very useful.
But why not set skip_existed
default to True
?
Thanks! This feature is very useful. But why not set
skip_existed
default toTrue
?
@genghisun Thanks! You are right and I have changed the default value to True
in the new commit.
Hi~ Sorry to bother you again.
Today I realized that single file download also needs the skip_existed
parameter, especially when downloading a list with multiple files, just like example below. Because in case the middle file in the list download fails, running the code again will re-download all the previous files, which is unnecessary.
import gdown
url_list = [
'https://drive.google.com/file/d/xxxx',
'https://drive.google.com/file/d/xxxx',
'https://drive.google.com/file/d/xxxx',
]
for url in url_list:
gdown.download(url, fuzzy=True, skip_existed=True)
Also, it is more uniform to have this parameter for both download
and download_folder
.
The implementation can be done by checking if the output
file exists at https://github.com/wkentaro/gdown/blob/main/gdown/download.py#L280.
Hi~ Sorry to bother you again. Today I realized that single file download also needs the
skip_existed
parameter, especially when downloading a list with multiple files, just like example below. Because in case the middle file in the list download fails, running the code again will re-download all the previous files, which is unnecessary.import gdown url_list = [ 'https://drive.google.com/file/d/xxxx', 'https://drive.google.com/file/d/xxxx', 'https://drive.google.com/file/d/xxxx', ] for url in url_list: gdown.download(url, fuzzy=True, skip_existed=True)
Also, it is more uniform to have this parameter for both
download
anddownload_folder
. The implementation can be done by checking if theoutput
file exists at https://github.com/wkentaro/gdown/blob/main/gdown/download.py#L280.
@genghisun Sounds useful. I have updated download.py
and please help to check whether it is reliable.
Use resume=True
introduced by https://github.com/wkentaro/gdown/pull/288, and it will skip downloading files if they already exist.