PixivUtil2 icon indicating copy to clipboard operation
PixivUtil2 copied to clipboard

Any Idea of making this multithread?

Open 2403772980ygy opened this issue 4 years ago • 11 comments

It is not hard to guess that the program gets all data and download links in a init process, why not put them in a queue and download them multithreaded?

2403772980ygy avatar Apr 29 '20 15:04 2403772980ygy

  1. Current UI doesn't support it.
  2. Pixiv might ban you due to unnatural access, see #504
  3. I'm too lazy to do it.

Nandaka avatar Apr 29 '20 15:04 Nandaka

Haha, word, man!

bluerthanever avatar Apr 29 '20 16:04 bluerthanever

OK.

  1. the UI is not that hard to modify, actually......
  2. you can control the thread count, also, your internet explorer(whatever it is) download multithreaded.
  3. LOL, I'll do some experiments on my machine and maybe can help.

Anyways, maybe I should do it myself by changing the current download function into a queue push, and add another part to download.

2403772980ygy avatar Apr 30 '20 00:04 2403772980ygy

Also bear with me, for I'm new to this project.

2403772980ygy avatar Apr 30 '20 00:04 2403772980ygy

Thinking out loud here, I was just browsing the issues.. Outside of what Nandaka mentioned- The largest possible problem that I can see is to do with the database. SQLite does not do well with concurrency, and threading/multiprocessing with SQLite can be a recipe for disaster. Some sort of system may need to be implemented to queue & dequeue statements/transactions in a separate thread/process, but then you get into issues with the GIL & sharing objects between the separate Python threads/processes... But if that were to be implemented, it would likely add a decent amount of complexity around how the download processes interact with the DB. I'm trying to think of some ideas real quick, but I can't come up with a simple solution around it.

jwshields avatar May 07 '20 08:05 jwshields

SQLite does not do well with concurrency, and threading/multiprocessing with SQLite can be a recipe for disaster.

I think you need to use lock, it will impact the performance of course. For UI, I think need to use curses, but I'm not sure the support on windows before Win10

Nandaka avatar May 07 '20 09:05 Nandaka

I didn't think of SQLite problems, the lock should only deal with operations outside of downloading and should not do with downloading because it is the main thing we need to accelerate using multithreading, maybe this can help with SQLite problems.

As for curses, I completely agree, but I haven't come up with a decent design yet.

My main concern is that the downloading need some rework to integrate the new UI and multithreading.

2403772980ygy avatar May 07 '20 13:05 2403772980ygy

if you are using curses, the default print will be hidden, right? I have made some changes on the latest commit to allow you to pass a notifier from the caller, so it can handle the messages and show it to the ui. I think multi threading is more useful for batch downloader (option b)

Nandaka avatar May 07 '20 16:05 Nandaka

Duplicate of https://github.com/Nandaka/PixivUtil2/issues/504

You could stay with the current printline mode, just use the python progress library to indicate how many posts of possible posts have downloaded with a textual progress bar instead of logging each post in detail serially.

This would make things harder to debug though, so I would leave in the existing process post which ties logging to the actual download procedure.

PixivUtil performs no processing that would benefit from multithreading (see also python GIL) except for making parallel requests, so it would suffice to 1)Pull the downloading procedure into the caller, make multiple async 2)Let the caller show progress 3)Perform process post on each result in the caller after the download completes.

But TBH once you no longer have a download backlog, the savings of this are pretty negligable.

bbappserver avatar May 08 '20 00:05 bbappserver

if we let the caller update the process bar and We only need to download one image this looks kinda awkward, I suggest using the downloaded/total size to show the process in the bar.

2403772980ygy avatar May 08 '20 01:05 2403772980ygy

@bbappserver I'm more thinking of the context running multiple jobs in parallel (e.g. downloading mix of by member id or by tags), so inside the thread it just call the appropriate method.

https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L157 https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L82 https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L106

Currently, the UI printout and logging is kind of tied together, using curses won't mess with it assuming the thread id is included in the logging.

As for the notifier, it is up to the dev to include how much details they want to include, currently it just pass start and stop message.

Nandaka avatar May 08 '20 01:05 Nandaka