don't use immutable=1, only mode=ro
Opening db files in immutable mode sometimes leads to the file being mutated, which causes duplication in the docker image layers: see #1836, #1480
That this happens in "immutable" mode is surprising, because the sqlite docs say that setting this should open the database as read only.
https://www.sqlite.org/c3ref/open.html
immutable: The immutable parameter is a boolean query parameter that indicates that the database file is stored on read-only media. When immutable is set, SQLite assumes that the database file cannot be changed, even by a process with higher privilege, and so the database is opened read-only and all locking and change detection is disabled. Caution: Setting the immutable property on a database file that does in fact change can result in incorrect query results and/or SQLITE_CORRUPT errors. See also: SQLITE_IOCAP_IMMUTABLE.
Perhaps this is a bug in sqlite?
:books: Documentation preview :books:: https://datasette--1870.org.readthedocs.build/en/1870/
Relevant sqlite forum thread: https://www.sqlite.org/forum/forumpost/02f7bda329f41e30451472421cf9ce7f715b768ce3db02797db1768e47950d48
Codecov Report
Base: 92.55% // Head: 92.55% // No change to project coverage :thumbsup:
Coverage data is based on head (
4faa4fd) compared to base (bf00b0b). Patch has no changes to coverable lines.
Additional details and impacted files
@@ Coverage Diff @@
## main #1870 +/- ##
=======================================
Coverage 92.55% 92.55%
=======================================
Files 35 35
Lines 4432 4432
=======================================
Hits 4102 4102
Misses 330 330
| Impacted Files | Coverage Δ | |
|---|---|---|
| datasette/app.py | 94.30% <ø> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
as far as i can tell, this is where the "immutable" argument is used in sqlite:
pPager->noLock = sqlite3_uri_boolean(pPager->zFilename, "nolock", 0);
if( (iDc & SQLITE_IOCAP_IMMUTABLE)!=0
|| sqlite3_uri_boolean(pPager->zFilename, "immutable", 0) ){
vfsFlags |= SQLITE_OPEN_READONLY;
goto act_like_temp_file;
}
so it does set the read only flag, but then has a goto.
Just saw your comment here: https://github.com/simonw/datasette/issues/1836#issuecomment-1272357976
when you are running from docker, you always will want to run as
mode=robecause the same thing that is causing duplication in the inspect layer will cause duplication in the final container read/write layer whendatasette serveruns.
I don't understand this. My mental model of how Docker works is that the image itself is created using docker build... but then when the image runs later on (docker run) the image itself isn't touched at all.
Are you saying that I can build a container, but then when I run it and it does datasette serve -i data.db ... it will somehow modify the image, or create a new modified filesystem layer in the runtime environment, as a result of running that serve command?
Saw your comment here too: https://github.com/simonw/datasette/issues/1480#issuecomment-1271101072
switching from
immutable=1tomode=rocompletely addressed this. see https://github.com/simonw/datasette/issues/1836#issuecomment-1271100651 for details.
So maybe we need a special case for containers that are intended to be run using Docker - the ones produced by datasette package and datasette publish cloudrun? Those are cases where the -i option should actually be opened in read-only mode, not immutable mode.
Maybe a datasette serve --irw data.db option for opening a file in immutable-but-actually-read-only mode? Bit ugly though.
I should run some benchmarks to figure out if immutable really does offer significant performance benefits.
Are you saying that I can build a container, but then when I run it and it does
datasette serve -i data.db ...it will somehow modify the image, or create a new modified filesystem layer in the runtime environment, as a result of running thatservecommand?
Somehow, datasette serve -i data.db will lead to the data.db being modified, which will trigger a copy-on-write of data.db into the read-write layer of the container.
I don't understand how that happens.
it kind of feels like a bug in sqlite, but i can't quite follow the sqlite code.
Hello! Resurrecting this issue since we're running into something similar with data.catalyst.coop as our database files have ballooned up to several GB. Our Cloud Run revisions now require huge amounts of RAM to start up without receiving a SIGBUS.
I'd love to see this fix merged in. It sounds like we want to make the immutable/read-only mode decision more flexible before doing so, so that we can use ro in Docker and immutable outside. If that sounds right, I'm happy to take a crack at adding that as a command-line flag or something that gets set automatically based on the expected execution environment.