davros
davros copied to clipboard
Large grain crashes Davros
I have a grain that I used to sync all my photos to so it is about 11Gb and it can't start anymore.
As best I have found in sandstorm logs is that the problem is with the preview file caching storing its data on tmpdir which is memory backed in the grain and so it fills that and crashes.
Hmm, that's something I had not considered! I could have the thumbnailer expire images via LRU or something, probably as a sort of background job. Is this something that happens over a long duration of time, or you have one single large directory and merely viewing it fills up the memory and crashes?
Hmm, as I look at it a bit more I don't recollect previews/thumbnails being stored on a memory-backed filesystem. If you start davros outside of sandstorm on a unix system it'll likely put thumbnails in /tmp
, but within sandstorm these end up in /var/davros/tmp
, which should be file storage:
https://github.com/mnutt/davros/blob/master/.sandstorm/sandstorm-pkgdef.capnp#L160
Maybe it's some sort of leak in the thumbnailing itself, or davros trying to generate too many thumbnails at the same time?
This probably should constitute a breaking issue for approval. I'm not sure how many people have exceptionally large Davros grains, but I am concerned if we don't suss out the issue here, we will find out how many people have very large Davros grains. ;)
I know there was some further discussion on IRC, did we get anywhere in identifying exactly what the issue was? @griff, you mention some logs, can you share them here, by chance, sanitized if necessary?
I have just put my new grain through its paces and I can't reproduce my own problem so I am just closing this issue.
I first uploaded all pictures stored on my computer to the grain (11Gb) but in multiple folders.
And I have just now finished uploading all pictures from my phone (10Gb) using the same method that was used to populate the failing grain. It creates a single folder with all 2600 images and videos in it and while loading the davros view of just that folder is a bit slow I haven't noticed any breakage.
Sorry for the inconvenience!
I am able to reproduce this issue with a Davros grain that has over 1000 images. A backup of the grain is available here if anyone else wants to try. It contains a lot of NSFW language - it's a collection of memes I share with family and friends. There is no nudity. https://2oibt9mht7i0o2w4is69.ducky.sandcats.io/Davros_funnies_in_line.zip
@griff or @mnutt , can we reopen this?
I have found the underlying problem that was causing my issue. It is this: https://github.com/sandstorm-io/sandstorm/issues/3512
Okay, that would make this no longer a Davros issue, arguably, unless @mnutt intends to find some way to create less files when making thumbnails... which seems unrealistic?
Are you able to raise your fs.inotify.max_user_watches
value on the box in question? It sounds like in kernel 5.11 and up, Linux will more intelligently set this default value based on the memory of your machine.
Thank you to all who looked into this! I changed my fs.inotify.max_user_watches to 32768 and restarted, no dice. I do not see the error in sandstorm-io/sandstorm#3512 in my logs. I do see this in sandstorm.log now:
sandstorm/gateway.c++:1072: error: exception = kj/compat/http.c++:1851: failed: expected !inBody; previous HTTP message body incomplete; can't write more messages
stack: 4c8412 4ff5e4 4a99bf 4f60da 4f70d1 544fa1 4fe500
I would swear that error in the log is new. I haven't touched C++ in 16 years, but I'll take a look at that file and see if anything useful pops out at me.
@griff Did you see the error in your Sandstorm log by chance?
@Michael-S Do you know if that setting is machine or user specific where you changed it? I want to say it might be the latter, and sandstorm runs as its own user account. (I don't know how to set that setting even, just trying to ballpark guesses based on what I read.)
I changed it in /etc/sysctl.conf and rebooted the VM, so I don't think that's it.
Edit: to inspect the value, do cat /proc/sys/fs/inotify/max_user_watches
To change the value, you can change it dynamically but the easy way is to add a line to /etc/sysctl.conf:
fs.inotify.max_user_watches=32768
and then restart.
@ocdtrekkie, it is not user specific.
🤔 So do @griff and @Michael-S have different issues then? I am really curious if @griff found the sandstorm/supervisor.c++:232: overloaded: inotify_add_watch: No space left on device
errors in his system log then, since @Michael-S did not.
@ocdtrekkie I got inotify_add_watch
in the log and increasing fs.inotify.max_user_watches
fixed my issue. So it looks to be different issues.
As a stupid check (on myself): I asked if @Michael-S saw the inotify_add_watch error appeared in the Sandstorm/system log, and the other issue specifies that it appears in the grain log. @Michael-S Nothing in the grain log for the grain that won't start for you, right?
Right, nothing in the grain log and nothing related to inotify in the Sandstorm log.
Okay thanks, I figured but just wanted to confirm
Hmm, at some point maybe I can explore storing thumbnails in a SQLite database or something.