DownloaderForReddit icon indicating copy to clipboard operation
DownloaderForReddit copied to clipboard

Freeze while downloading

Open thany opened this issue 2 years ago • 16 comments

Describe the bug After letting it download for a few hours, the UI freezes to the point that Windows adds (Not Responding) into the title bar, dimming the main window. It really doesn't respond to anything, obviously.

However, the current download is still going. It's definitely downloading new files and adding them to disk. It's like only the GUI is frozen, and all the rest is fine. But it's also demanding 100% utilisation of one CPU core, so that's still not great.

I don't know exactly when this happened. I just found the program in this state after letting it do its thing for a couple of hours. So I think I'll let it carry on, as terminating the program will probably force every downloaded file to get downloaded again (it was like that in 2.x versions anyway - not sure if this has been fixed, or even if it can be fixed).

While typing this up, I also see the GUI "thawing" from time to time, only to freeze up almost immediately after. So it's not a permanent freeze. Something must be super busy. I wonder what part of the GUI needs that much processing power...

To illustrate: image

I don't think memory usage like that is normal either, given that it's running on a database (and assuming the database isn't entirely loaded into memory)...

Environment Information

  • Windows 10 21H2
  • Software version 3.13.1
  • Are you running a compiled version or from source? - Compiled

To Reproduce (optional) Hard to say, as stated before. Let it download a long time.

For all but the most trivial of issues, please attach the latest log file. Yeah, can I send it privately please, if neccesary? There's a bit of naughtyness in there 😘

thany avatar Feb 01 '22 12:02 thany

That kind of CPU and memory usage is definitely not normal. I can't imagine what would lead to that. The GUI runs on its own thread, so anything doing any substantial work should not affect it.

You can email the log privately to [email protected]. I'm the only one who can access it there. I doubt that whatever is making this happen will show up in the log, but it couldn't hurt.

Closing the app shouldn't have an effect on any downloaded files except for ones that are currently in the process of downloading. Everything is saved to the database immediately and very little is kept in memory without at least being backed up in the database.

MalloyDelacroix avatar Feb 01 '22 13:02 MalloyDelacroix

Come to think, maybe it has to do with the way Windows handles resolution/display changes. Since I'm accessing the program on a remote VM on the home server over RDP (so I can let it run and shut down my computer), the remote changes resolution to make it fit the RDP session.

It might be possible that that's what is causing DFR to get confused in the GUI thread? Just speculating here, because I too wouldn't know how a GUI could work normally for hours and then just freeze up for no obvious reason, without even interacting with it.

Also, logs have been sent. It even correctly rotated logs, so there's also a .log.1 file which probably just contains more of the same.

thany avatar Feb 01 '22 14:02 thany

Update: I've updated to 3.13.2 and this time I've left the RDP session connected.

It's not freezing yet, even after a few hours of purring along. This doesn't mean it's solved though - it might still need something to get it properly fixed.

Regardless of RDP sessions and changing display settings, there's still something going on that might be weird. While downloads are getting along perfectly fine, I see the CPU time spiking every ~5 seconds to somewhere around 15% for the duration of probably a second (it's hard to guess short-lasting spikes using task manager 🤷‍♂️). This 15% is across all three CPU cores, so it's close to 50% on a single core. That seems a bit much for a what it's doing.

It's also gradually building up memory usage. This might be indicative of a memory leak, or it might be by design (database cache). But it's at 1.2GB as I'm typing this, which does seem like a lot. Shortly after starting I've seen it at 500MB, a little later I've seen 700MB, and now this. Memory is of course there to be used, but a buildup like that is unusual.

And I know it's an unfair comparison, but you can see Total Commander sitting there using almost nothing from the memory, and that one's been running for days, sometimes weeks on end, without ever closing it.

thany avatar Feb 02 '22 14:02 thany

Are you by any chance downloading a long video? The only thing that I can think of which would use up that much CPU and memory is FFMPEG combining a large video.

What do you see when you click on the arrow next to DFR in task manager?

zacker150 avatar Feb 03 '22 04:02 zacker150

Are you by any chance downloading a long video?

No, small videos and images were getting saved one after the other, so it likely wasn't doing a very large download.

What do you see when you click on the arrow next to DFR in task manager?

Nothing interesting. Just another entry "Downloader for Reddit". Just the one, no ffmpeg.

Because leaving the session connected hasn't seemingly triggered this problem, really makes me suspect a change in resolution or other display settings, to cause the GUI to get blocked up.

thany avatar Feb 03 '22 09:02 thany

@MalloyDelacroix It's happening again on 3.14.1. No network I/O, no disk I/O, but fully blasting the CPU on all cores to within availability. It's not completely frozen as in "Not Responding" - the GUI is still responding as it should do (so at least the program appears to written well, using threads and whatall).

This is an overview (from ProcessHacker) of the offending threads that are blocking up the CPU: image

I hope this tells you more than it does me 😀

Here's another interesting tab: image

Look at the number of I/O bytes read and compare to the total time. That's just excessive. I'm not seeing the SSD going nuts, so I'm guessing it's some sort of internal I/O going on. Maybe memory access or something. It's still a lot. And 6.7 trillion CPU cycles, for a download pogram, am I reading correctly? Wow 😀

The number of handles appear to be ever decreasing, which seems strange - what will happen when it hits 0, and why did it get so high in the first place?

Edit: I killed the program in hopes it wouldn't break anything. What else am I going to do from my end? But after starting it back up again, and starting a download, it goes right back to being stuck in the exact same way. It immediately manages 700~900MB/s I/O rate, presumably until I kill it again, to whatever device that can handle it (SSD is still basically idle).

thany avatar Jul 14 '22 20:07 thany

It unstuck itself after a good long while. I left it by itself, so I can't see how long it took. Too long, either way 😀

The questions that remain: how could this have happened, and what options do we have to prevent this?

thany avatar Jul 15 '22 09:07 thany

@MalloyDelacroix So I ran into this problem again. Any ideas by now, what could be causing this?

thany avatar Aug 15 '22 18:08 thany

I can have a look at the database view if that might tell what the hecko is making it so slow, but you'll then have to tell me exactly what to click on, because the database view really feels like a hastily implemented support tool :)

By the by, I just noticed this: image

This tells me (correct me if I'm wrong) that it constantly opens and closes handles to the database file, and therefor constantly opens and closes the database - maybe even from multiple threads as well. This might indicate why it's so slow, even though it isn't (or shouldn't be) deleting a massive amount of records.

thany avatar Aug 15 '22 18:08 thany

One more update: I's been sitting there, downloading nothing since the last literal 12 hours!

All it's been doing, is using up CPU cycles, about a GB of memory, and tens of MB/s of READ on my C-drive. Downloads are supposed to be written to my H-drive - only the database and the program are on the C-drive.

So I decided to clear out the content and post tables using a SQlite tool. Things are much faster now. Seems like this program needs a function to clean up its own database... SQlite databases shouldn't be allowed to grow forever, it's not a database library suitable for that. Proper database servers are more suitable, but of course that doesn't fit in a standalone program, so... I dunno what the best option is in this case.

Maybe you could start by putting some indices on the tables, on the fields you need to access. Perhaps you were looping through my 800,000+ posts table "by hand" for each iteration of the download process, or something. Indices help. Maybe then it can be allowed to grow a bit bigger before it grinds to a halt once more.

thany avatar Aug 16 '22 07:08 thany

I would not have thought that database access was the issue slowing it down this much. It sounds like I need to do a deep dive into making the database operate more efficiently and make a database cleanup module when I have the time to do so again.

Thanks for the testing and information. This will give me a solid direction to head in.

MalloyDelacroix avatar Aug 16 '22 17:08 MalloyDelacroix

No problem. Feel free to provide a test version if you're comfortable to do so, because I've kept my "slow database" around for future testing.

thany avatar Aug 16 '22 18:08 thany

@MalloyDelacroix Maybe this will help you along. These are the queries I need to execute to get the program "going" again:

delete from content;
delete from post;
delete from subreddit where id in (select id from reddit_object where new=1);
delete from user where id in (select id from reddit_object where new=1);
delete from reddit_object where new=1;

Now, I'm not 100% about the new=1 - this was seemingly the way to select whether an object is in one of the users or subreddits lists that is visible to the end user. I couldn't any other reliabl way to determine this.

So essentially this deletes anything that doesn't need to be kept around purely for leeching.

Another thing that was really interesting, is when I reverse the first two queries, it takes absolutely forever. I let it sit for half an hour or so and then killed my sqlite tool. The above order makes it delete the same records in under a second. I don't know why this is, but hopefully this gives you some insight on why DFR is so slow, perhaps it tries to do stuff in a way that is massively slow in sqlite, where doing the same thing in a different order might be a lot snappier.

One other thing that stands out, possible completely unrelated to the issue at hand, now that it's not constantly blasting the CPU at full horsepowers anymore, is that it sits completely idle sometimes for seonds on end. No CPU, disk I/O, network I/O at all. Just waiting, I guess, but waiting for what? Please note that I'm on 1Gbps fiberoptic, so I hope it's not waiting for anything remote.

thany avatar Aug 17 '22 19:08 thany

Not only is this still happening, I've noticed a different kind of freeze.

I'm also seeing the program just waiting around. No notable CPU utilisation, no significant I/O activity, and nothing on the network. What's it doing? It appears to be waiting for something. But even "debug" logging does not reveal what it's actually currently doing.

And then when I stop the download, the whole program closes. Is that a crash? I don't see any errors... I wish this program was a "set and forget" kind of deal, but you really have to hold its hand. Stopping it and restarting it, cleaning out the database, over and over and over. And then when I start it right back up, it's purring along like a kitten as if nothing ever went south.

I don't get it. Something must be wrong buried in there somewhere.

thany avatar Nov 20 '22 15:11 thany

I assume the app's database operations might be a weak point. Judging by the database file sizes some users have reported, I believe I way underestimated the number of downloads that users would be performing and the amount of data that would be stored in the database. I did not prioritize database efficiency enough because I didn't think it would ever be a problem. I was wrong.

I hope to fix this in the future when I have time to do a massive update.

There aren't any spots in the app that should hang for a significant amount of time. The longest downtime you should ever see is if you are extracting a large amount of content from reddit. Individual extracts are not reported to users, so this can look like the app is frozen. But this should not last as long as you are reporting.

If you have millions of content urls stored in your database, you may see better results if you disable the duplicate check. I suspect this query may be responsible for some delays in very large databases.

MalloyDelacroix avatar Nov 21 '22 13:11 MalloyDelacroix

I did not prioritize database efficiency enough because I didn't think it would ever be a problem. I was wrong.

I wouldn't say that. Sqlite is a solid framework, just not built to withstand very large amounts of records as well as a true RDBMS does. Maybe a different system would've been better, maybe Sqlite could've been set up better. But I'm assuming at the time you worked with the information you had, and I trust you made the best choice based on that. Things can change, and perhaps now a different setup makes more sense. Don't blame yourself, is what I'm saying.

I hope to fix this in the future when I have time to do a massive update.

That's okay. This is a hobby project after all, and I'm not demanding anything, so do take your time. I can cope in the mean time.

If I knew python I would've looked into it as well, but I'm totally clueless in python 🤷🏻‍♂️ - I'm okay with Node.js, but that ain't helping you, is it 😀 However, if you need me to try something out, or need to know anything, feel free to ask away.

you may see better results if you disable the duplicate check

The duplicate check is a viable thing to try. I've disabled it right away. Let's see how that goes.

thany avatar Nov 21 '22 15:11 thany