kiwix-desktop icon indicating copy to clipboard operation
kiwix-desktop copied to clipboard

Kiwix goes into non responsive when trying to download 78GB wikipedia

Open Emnolope opened this issue 6 years ago • 45 comments

Honestly this was expected. Is there a better way for Kiwix to handle downloads of such large files? While I was doubting I did check the file size in windows explorer and indeed it is going up in size, however because of the large size, the computer is having difficulty with using the full capabilities of the network card.

Basically there should be a cleaner way for Kiwix handling such large downloads.

Emnolope avatar Dec 23 '18 00:12 Emnolope

@Emnolope This is a bit strange. This happens directly after starting the download? Or later? Is Kiwix Desktop then unresponsive during the whole download?

kelson42 avatar Dec 23 '18 15:12 kelson42

I found I was able to get it to work, by starting the download, waiting for the program to crash, then restarting the program without restarting the download.

On Sun, Dec 23, 2018 at 7:18 AM Kelson [email protected] wrote:

@Emnolope https://github.com/Emnolope This is a bit strange. This happens directly after starging the download? Or later? Is Kiwix Desktop then unresponsive during the whole download?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-desktop/issues/87#issuecomment-449643191, or mute the thread https://github.com/notifications/unsubscribe-auth/AqyAjNuIEB5DZMUpZEUY8Wu-XKUV82wwks5u757WgaJpZM4Zfqmm .

Emnolope avatar Dec 25 '18 02:12 Emnolope

When I do this, it becomes, well at least, somewhat stable

On Mon, Dec 24, 2018 at 6:58 PM Emmanuel Lopez [email protected] wrote:

I found I was able to get it to work, by starting the download, waiting for the program to crash, then restarting the program without restarting the download.

On Sun, Dec 23, 2018 at 7:18 AM Kelson [email protected] wrote:

@Emnolope https://github.com/Emnolope This is a bit strange. This happens directly after starging the download? Or later? Is Kiwix Desktop then unresponsive during the whole download?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-desktop/issues/87#issuecomment-449643191, or mute the thread https://github.com/notifications/unsubscribe-auth/AqyAjNuIEB5DZMUpZEUY8Wu-XKUV82wwks5u757WgaJpZM4Zfqmm .

Emnolope avatar Dec 25 '18 03:12 Emnolope

This shouldn't. The download itself is handle by a different process. And kiwix ui just update the information every second. Which version of kiwix-desktop are you using ? Windows, Linux ?

mgautierfr avatar Jan 07 '19 10:01 mgautierfr

@Emnolope Have you been able to reproduce the problem with the last beta?

kelson42 avatar May 22 '19 12:05 kelson42

@Emnolope I'm pretty convinced we have fixed all of this in last betas. If the problem still happen, please reopen the ticket.

kelson42 avatar Jun 11 '19 15:06 kelson42

@mgautierfr @jetownfeve21 I have to reopen this ticket as it still does not work properly with the RC1. I have downloaded the last version of WPDE (with pictures). The Kiwix UI get frozen (and the Ubuntu OS also complains about it) time to time during a long/big download. It also get frozen at the very end of the download process.

kelson42 avatar Aug 21 '19 10:08 kelson42

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Nov 27 '19 04:11 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Jun 08 '20 20:06 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Oct 28 '20 22:10 stale[bot]

I too experience absurd GUI response delays of multiple seconds up to a minute when saturating my internet connection by downloading multiple things. There is almost no CPU usage (probably some IO thing going on).

Does the GUI loop interact synchronously with the download processes?

AllanWegan avatar Jan 31 '21 23:01 AllanWegan

@AllanWegan It looks like indeed that part of the process still run in the main UI loop. Unclear so far which one so far.

kelson42 avatar Feb 01 '21 00:02 kelson42

Maybe easiest and best to remove download ability from Kiwix. Instead tell people to use downlad managers like Internet Download Manager, Internet Download Accelerator (IDA), Free Download Manager, and many others. And torrenting. Never had any problems with them. Downloaded whole english wikipedia 5 times already (about 400 GB).

ghost avatar Feb 16 '21 19:02 ghost

@GoblinLegislator If we remove a feature each time we have a bug... we could stop the project right now :)

kelson42 avatar Feb 17 '21 01:02 kelson42

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Jun 02 '21 16:06 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Apr 16 '22 15:04 stale[bot]

I tested downloading with the latest kiwix app on macOS and could not reproduce. Is there a platform where this shows up with a recent release?

adamlamar avatar Dec 21 '22 02:12 adamlamar

@adamlamar This happens with Kiwix Desktop for Linux and Windows... This code base. Kiwix Desktop for MacOS has its code base in repository "kiwix/apple".

kelson42 avatar Dec 21 '22 05:12 kelson42

Thanks @kelson42 , I didn't realize they were different codebases.

Testing on linux, I didn't see an issue with hangs. The scrolling and interface remain somewhat responsive even with a lot of concurrent downloads (5-10).

But Windows very clearly hangs after pressing Download on a large zim.

Screenshot 2022-12-21 at 1 46 21 PM

I believe the problem lies in the libkiwix downloader. Although this makes a request to the aria rpc endpoint, my hypothesis is that aria doesn't reply immediately to some requests. At first I thought the problem was with the startDownload function, but the screenshot clearly shows 800KB has downloaded already. So the aria rpc may be hanging at some point during the download setup or status check and due to the blocking libcurl request in libkiwix that ultimately leads to hanging the UI thread.

The other thing I noticed is aria does preallocation of files at the beginning of the download. There are some warnings about how this process works quickly with modern filesystems but can take a long time on ext3/fat32. My Windows VM should be running NTFS, but it still took a long time for me. I could see the preallocation in action where we have only downloaded ~400MB but the file is ~11GB:

Screenshot 2022-12-21 at 1 58 43 PM

The hang seemed coincide with the file preallocation. Maybe aria takes a lock out when performing preallocation and the rpc endpoints wait for the lock to return?

I expect that users with fat32 filesystems on removable HDDs would have an even worse hang since they'd have to wait the full duration of the preallocation process.

I think the ultimate fix for something like this is to make the aria rpc endpoint requests fully async so there is no chance they will become blocking. Or, run the downloader in its own thread in kiwix-desktop.

adamlamar avatar Dec 22 '22 00:12 adamlamar

@adamlamar Thank you for this in depth analysis. I had the opportunity to talk to @mgautierfr about this ticket yesterday. All what he said seems to be confirmed by your latest comment. In particukar the fact that he seems to experience UI freezes at start of the download (and not the end). Anyway, it seems pretty clear to me that dealing asynchronously with aria2c is a good opportunity to fix this very old ticket.

kelson42 avatar Dec 22 '22 06:12 kelson42

I've faced very few small hangs on linux too at start of downloading. But most of the time they don't appear. I haven't face hangs at end of download for "small" zim files (few GB) but 78GB download is still running.

We already pass a option to not preallocate files (https://github.com/kiwix/libkiwix/blob/master/src/aria2.cpp#L88) so it should not be the issue. But we may have miss something on Windows.

The only point I see where it could blocks is indeed on the aria rpc call. I have also succeed to break the download system once or twice by quickly launch/start/pause/cancel downloads:

  • One crash
  • One file I cannot download anymore. "Download is not available" when on click on the download button.

I think the ultimate fix for something like this is to make the aria rpc endpoint requests fully async so there is no chance they will become blocking. Or, run the downloader in its own thread in kiwix-desktop.

We have already move all the downloading process (aria) in another process. The rpc call is local only the latency should not be a issue here. What you suggest would indeed fix the issue but I think it would be better to know why RPC is hanging and fix that.

mgautierfr avatar Dec 22 '22 13:12 mgautierfr

I think the challenge with the current approach is that any IO or blocking (even CPU intensive tasks) can cause unresponsiveness in the UI. The UI thread can be called upon many times per second and even a 1ms HTTP request to the aria rpc endpoint will technically block UI updates. In some cases there won't be a perceivable delay, but its still happening.

Its good to have the actual download and disk operations happening in another thread (or in this case, another process). But it does seem difficult to maintain the constraint that aria must respond to every rpc request quickly enough to keep the kiwix-desktop UI responsive. Even if we fix it now, will aria reintroduce excess latency in a later release? Is low-latency response a priority for aria, or is a few seconds response time even considered a bug?

I looked more at the libkiwix downloader and introducing async seems hard because the API would need to change. For example, we couldn't return Download* from startDownload because we wouldn't have the download ID returned from aria#addUri.

I'm a little rusty on Qt but I'll spend today looking at how to introduce a downloader thread. That seems the least disruptive change AFAICT. We can always look to fix the other issues too, like the rpc hang in aria, but if the downloader runs in its own thread, those issues will be less perceivable to the user.

adamlamar avatar Dec 22 '22 18:12 adamlamar

Coming back from hollydays. Sorry for the delay.

Have you succeed to have something working about threading @adamlamar ? We already have a thread to download the catalog data in kiwix-desktop. Maybe you can base yourself on it.

mgautierfr avatar Jan 03 '23 13:01 mgautierfr

No worries @mgautierfr, I have been away as well. Happy holidays! I will check out the catalog approach as well.

I was able to get something working with the downloader running in a QThread and using signals/slots for events. The hang is hard to reproduce in my linux dev environment, but the download functionality seems to work as expected. I still need to complete a few more things, including:

  • Disabling the download button after starting the download, otherwise the user might press more than once
  • Copy the approach for other operations besides start download
  • Test on windows
  • General cleanup

I should be able to dedicate a good amount of time to completing this next week.

adamlamar avatar Jan 04 '23 18:01 adamlamar

Hey all, so I finished backgrounding the rest of the operations in this branch. Overall, the high level design is something like:

  • Run a new QThread using the BackgroundDownloader. Any slots invoked on the BackgroundDownloader are run on this new thread (not the UI thread)
  • The ContentManager sends signals from the UI thread to the BackgroundDownloader thread.
  • When needed, the BackgroundDownloader sends signals back to the ContentManager to confirm operations, such as starting or canceling a download
  • The BackgroundDownloader::updateStatus() is invoked once per second by a QTimer, and updates the internal m_status map with information about the download
  • The BackgroundDownloader:: getDownloadStatus() method can be called from the ContentManager on the UI thread to get the status of any particular download
  • Since BackgroundDownloader can be called from two different threads, reading from the m_status map and the downloader operations are protected by a ReadWrite lock

This works ok overall, but I've found the UI doesn't respond Windows when the file preallocation is occurring. This is because the updateStatus() method blocks the event loop in BackgroundDownloader and so other operations (like starting a second download) don't respond until the event loop is unblocked.

One way I found around this is to set split=1 on the aria2c config. On Windows, even if file preallocation is turned off, aria2c will still perform file preallocation when split>1. Unfortunately, the tradeoff with split=1 is that only one connection per download can run concurrently. By default, split=5, and setting to 1 has a big negative impact on download speeds.

I think the best way to work around this aria2c problem is to set a timeout on the libcurl request to the aria RPC endpoint. If it takes more than (say) 100ms, we could assume that file preallocation is occurring and return a status representing that. However, preallocation can take a long time (minutes or longer) on slow disks and there is no status information available (such as the percentage complete). And the download doesn't even start until file preallocation has completed.

Ideally there would be a downloader library that was smarter about preallocation on specific filesystems. For example it could allocate chunks on-demand or have an allocator and downloader threads run at the same time. Not sure if there are other library options which would have this high level behavior available.

Let me know what you think.

adamlamar avatar Jan 24 '23 22:01 adamlamar

@adamlamar Thank you for the update, can you please create a PR?

kelson42 avatar Jan 25 '23 01:01 kelson42

This works ok overall, but I've found the UI doesn't respond Windows when the file preallocation is occurring. This is because the updateStatus() method blocks the event loop in BackgroundDownloader and so other operations (like starting a second download) don't respond until the event loop is unblocked.

Which method exactly is blocking in updateStatus() ? The purpose of moving the downloading in a different thread is exactly this use case. The downloading itself is already done in a different thread (even a different process). We ~~need~~ want thread to not block in case of lag in the communication with the download process. So the idea is to NOT get the lock when doing rpc call. I see than in BackgroundDownloader::startDownload you have a lock when you do the actuall rpc call startDownload. Do we really need it ? What shared value is modified ?

One way I found around this is to set split=1 on the aria2c config. On Windows, even if file preallocation is turned off, aria2c will still perform file preallocation when split>1. Unfortunately, the tradeoff with split=1 is that only one connection per download can run concurrently. By default, split=5, and setting to 1 has a big negative impact on download speeds.

It is surprising. There is a issue on aria2c https://github.com/aria2/aria2/issues/1396 side. It is told that the two options are not related. Maybe you can share more about your investigation there.

mgautierfr avatar Jan 25 '23 10:01 mgautierfr

I think I have found the root cause : In libkiwix's aria2.cpp, we use a lock to prevent a race condition when we could reuse the same curl context (https://github.com/kiwix/libkiwix/blob/main/src/aria2.cpp#L137-L156).

While this make the aria2 wrapper threadsafe (as we can call it from different threads safely), it is not really multithread compliant (we cannot do several requests is parallel). So by definition, if addUri method (which is used to start a download) takes time, all other requests will be blocked, whatever if they are made from the same thread or not.

We have to make the aria2 wrapper fully multrithread and also make the libkiwix::Downloader thread safe/compliant. Then it would be possible to use it correctly in a multithreaded client (kiwix-desktop) without such bottleneck.

mgautierfr avatar Jan 25 '23 13:01 mgautierfr

@mgautierfr I believe this line is blocking the BackgroundDownloader's event loop: https://github.com/kiwix/kiwix-desktop/pull/919/files#diff-de6d6dc21894f626a8d8aa19ae0974692384776ff9ea5796987397fd1dcf2832R111

So the idea is to NOT get the lock when doing rpc call

The RPC call is outside of the lock.The overall event loop looks like this:

Thread 1 - UI Thread
Runs code from many classes, including ContentManager
Has its own event loop

Thread 2 - parentless QThread started in BackgroundDownloader
Runs code from BackgroundDownloader only
  event loop runs one of:
  - updateStatus() (once per second as invoked by the QTimer)
  - startDownload()
  - completeDownload()
  - pauseDownload()
  - resumeDownload()
  - cancelDownload()

When updateStatus() blocks, the whole event loop blocks and received signals queue up behind. Due to file preallocation, this could happen for minutes. And the user sees the delay when they go do the next action, say downloading another zim. The program does not go into Not Responding (as it did before), but the UI does not act correctly (e.g. the download does not start after pressing the Download text).

in BackgroundDownloader::startDownload you have a lock

That's true. I don't believe the startDownload call normally blocks, but I can remove the locking around mp_downloader since there is no concurrent access (only event loop access). The only concurrent access occurs against m_status.

My interpretation of https://github.com/aria2/aria2/issues/1851, https://github.com/aria2/aria2/issues/1842, and https://github.com/aria2/aria2/issues/1396 is that file preallocation will always occur if split>1, and split=5 by default. Setting file-allocation=trunc might help when NTFS is used, but the user will still see the delay if FAT/exFAT is used (e.g. removable disk). This seems to be true in my testing - when I set split=1 manually, there is no delay starting downloads, but they run much slower.

adamlamar avatar Jan 25 '23 18:01 adamlamar

On the lock in aria2.cpp, I don't know if it would make a difference in kiwix-desktop because there is only one thread trying to invoke the downloader at any given time. So while it could be an overall improvement, I am not sure if it will solve the specific problem here.

Since we cannot always prevent aria2 from blocking during file preallocation, I will look into timing out the libcurl request and let you know.

adamlamar avatar Jan 25 '23 18:01 adamlamar