pypdl icon indicating copy to clipboard operation
pypdl copied to clipboard

Download multiple files with N no of thread with progress bar

Open wxguy opened this issue 1 year ago • 4 comments

There library looks promising.

As the title says, is it possible to download multiple files (option to limit N threads) simultaneously with progress bar?

Would be happy to see some example in this.

wxguy avatar Apr 04 '24 17:04 wxguy

yes its possible

from pypdl import Downloader

dl1 = Downloader()
dl.start('http://example.com/file1.txt',segments=10, block=false, display=false)

dl2 = Downloader()
dl.start('http://example.com/file2.txt', display=false)
# .....

this should allow for downloading multiple files simultaneously. the only caveat is that since both download shows progress bar it can flood the console since pypdl use a simple progress bar so you may need to create a custom progress bar to display both the output. you can access all the values like progress from the downloader object (dl.progress). or you can do a sequential download using loops in that case you can reuse the same downloader object. and set display to true this will provide u a progress bar. each file will be downloaded as multiple segment then combined to form the full download.

for more advanced examples check the readme files

mjishnu avatar Apr 04 '24 18:04 mjishnu

I think it is a bad idea to create multiple instances to download multiple files. What if I need to download more than 500 files. Rather, I would prefer to have additional arguments to the start method of the Downloader object like this:-

from pypdl import Downloader

URL = [url1, url2, url3 ... url500]

dl1 = Downloader()
dl.start(urls, segments=10, block=false, display=false, threads=10)

thread argument is to limit the no of parallel dowload at a time.

It would be a good idea to implement this in your package.

wxguy avatar Apr 05 '24 17:04 wxguy

yes as you said it would be a bad idea when we are downloading a lot of files. its going to be hard to incorporate multiple downloads to start method, it think a better approch is to instead create a factory method with a parameter like max_instance(threads) which limits the number of instances at a time and reuses these instance if there is more number of downloads than that are max_instances.

mjishnu avatar Apr 06 '24 04:04 mjishnu

completed the implemation of PypdlFactory, you can check it out here: https://test.pypi.org/project/pypdl/. can you test and share your findings if there are any bugs. btw the name of class has been changed (Downloader -> Pypdl) you can check the updated doc in v1.4.0 branch

mjishnu avatar Apr 14 '24 16:04 mjishnu

How about allowing custom progress bars? This could be solved with the position parameter of tqdm.

wLxCvcY20V avatar May 19 '24 03:05 wLxCvcY20V

How about allowing custom progress bars? This could be solved with the position parameter of tqdm.

we can already use custom progress bars just set the display parameter to false and use the progress attribute

in case of pypdl factory it is combining value from attributes of pypdl instances and producing a progress bar

mjishnu avatar May 19 '24 05:05 mjishnu

with tqdm(
    total=downloader.size,
    desc=f"Download {i}",
    unit='B',
    unit_scale=True,
    unit_divisor=1024,
    miniters=1,
    position=i
) as bar :
    previous = 0
    while not downloader.completed :
        bar.update(downloader.current_size - previous)
        previous = downloader.current_size
        time.sleep(0.1)

This should do the job. I would appreciate if someone tested it.

wLxCvcY20V avatar May 20 '24 00:05 wLxCvcY20V

this feature has been added in v1.4.0

mjishnu avatar Jun 04 '24 18:06 mjishnu