davros icon indicating copy to clipboard operation
davros copied to clipboard

Large grain crashes Davros

Open griff opened this issue 3 years ago • 19 comments

I have a grain that I used to sync all my photos to so it is about 11Gb and it can't start anymore.

As best I have found in sandstorm logs is that the problem is with the preview file caching storing its data on tmpdir which is memory backed in the grain and so it fills that and crashes.

griff avatar Sep 21 '21 20:09 griff

Hmm, that's something I had not considered! I could have the thumbnailer expire images via LRU or something, probably as a sort of background job. Is this something that happens over a long duration of time, or you have one single large directory and merely viewing it fills up the memory and crashes?

mnutt avatar Sep 22 '21 00:09 mnutt

Hmm, as I look at it a bit more I don't recollect previews/thumbnails being stored on a memory-backed filesystem. If you start davros outside of sandstorm on a unix system it'll likely put thumbnails in /tmp, but within sandstorm these end up in /var/davros/tmp, which should be file storage:

https://github.com/mnutt/davros/blob/master/.sandstorm/sandstorm-pkgdef.capnp#L160

Maybe it's some sort of leak in the thumbnailing itself, or davros trying to generate too many thumbnails at the same time?

mnutt avatar Sep 22 '21 02:09 mnutt

This probably should constitute a breaking issue for approval. I'm not sure how many people have exceptionally large Davros grains, but I am concerned if we don't suss out the issue here, we will find out how many people have very large Davros grains. ;)

I know there was some further discussion on IRC, did we get anywhere in identifying exactly what the issue was? @griff, you mention some logs, can you share them here, by chance, sanitized if necessary?

ocdtrekkie avatar Sep 22 '21 19:09 ocdtrekkie

I have just put my new grain through its paces and I can't reproduce my own problem so I am just closing this issue.

I first uploaded all pictures stored on my computer to the grain (11Gb) but in multiple folders.

And I have just now finished uploading all pictures from my phone (10Gb) using the same method that was used to populate the failing grain. It creates a single folder with all 2600 images and videos in it and while loading the davros view of just that folder is a bit slow I haven't noticed any breakage.

griff avatar Sep 23 '21 00:09 griff

Sorry for the inconvenience!

griff avatar Sep 23 '21 00:09 griff

I am able to reproduce this issue with a Davros grain that has over 1000 images. A backup of the grain is available here if anyone else wants to try. It contains a lot of NSFW language - it's a collection of memes I share with family and friends. There is no nudity. https://2oibt9mht7i0o2w4is69.ducky.sandcats.io/Davros_funnies_in_line.zip

Michael-S avatar Oct 29 '21 01:10 Michael-S

@griff or @mnutt , can we reopen this?

ocdtrekkie avatar Oct 29 '21 13:10 ocdtrekkie

I have found the underlying problem that was causing my issue. It is this: https://github.com/sandstorm-io/sandstorm/issues/3512

griff avatar Nov 28 '21 23:11 griff

Okay, that would make this no longer a Davros issue, arguably, unless @mnutt intends to find some way to create less files when making thumbnails... which seems unrealistic?

Are you able to raise your fs.inotify.max_user_watches value on the box in question? It sounds like in kernel 5.11 and up, Linux will more intelligently set this default value based on the memory of your machine.

ocdtrekkie avatar Nov 29 '21 02:11 ocdtrekkie

Thank you to all who looked into this! I changed my fs.inotify.max_user_watches to 32768 and restarted, no dice. I do not see the error in sandstorm-io/sandstorm#3512 in my logs. I do see this in sandstorm.log now:

sandstorm/gateway.c++:1072: error: exception = kj/compat/http.c++:1851: failed: expected !inBody; previous HTTP message body incomplete; can't write more messages
stack: 4c8412 4ff5e4 4a99bf 4f60da 4f70d1 544fa1 4fe500

I would swear that error in the log is new. I haven't touched C++ in 16 years, but I'll take a look at that file and see if anything useful pops out at me.

Michael-S avatar Nov 29 '21 15:11 Michael-S

@griff Did you see the error in your Sandstorm log by chance?

@Michael-S Do you know if that setting is machine or user specific where you changed it? I want to say it might be the latter, and sandstorm runs as its own user account. (I don't know how to set that setting even, just trying to ballpark guesses based on what I read.)

ocdtrekkie avatar Nov 29 '21 15:11 ocdtrekkie

I changed it in /etc/sysctl.conf and rebooted the VM, so I don't think that's it. Edit: to inspect the value, do cat /proc/sys/fs/inotify/max_user_watches To change the value, you can change it dynamically but the easy way is to add a line to /etc/sysctl.conf: fs.inotify.max_user_watches=32768 and then restart.

Michael-S avatar Nov 29 '21 15:11 Michael-S

@ocdtrekkie, it is not user specific.

zenhack avatar Nov 29 '21 17:11 zenhack

🤔 So do @griff and @Michael-S have different issues then? I am really curious if @griff found the sandstorm/supervisor.c++:232: overloaded: inotify_add_watch: No space left on device errors in his system log then, since @Michael-S did not.

ocdtrekkie avatar Nov 29 '21 18:11 ocdtrekkie

@ocdtrekkie I got inotify_add_watch in the log and increasing fs.inotify.max_user_watches fixed my issue. So it looks to be different issues.

griff avatar Nov 29 '21 19:11 griff

As a stupid check (on myself): I asked if @Michael-S saw the inotify_add_watch error appeared in the Sandstorm/system log, and the other issue specifies that it appears in the grain log. @Michael-S Nothing in the grain log for the grain that won't start for you, right?

ocdtrekkie avatar Nov 29 '21 20:11 ocdtrekkie

Right, nothing in the grain log and nothing related to inotify in the Sandstorm log.

Michael-S avatar Nov 29 '21 20:11 Michael-S

Okay thanks, I figured but just wanted to confirm

ocdtrekkie avatar Nov 29 '21 22:11 ocdtrekkie

Hmm, at some point maybe I can explore storing thumbnails in a SQLite database or something.

mnutt avatar Nov 30 '21 00:11 mnutt