datasette Explore if SquashFS can be used to shrink size of packaged Docker containers

Explore if SquashFS can be used to shrink size of packaged Docker containers

Open simonw opened this issue 7 years ago • 4 comments

Inspired by this article: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html#sqlite-database-indexed--squashed

https://en.wikipedia.org/wiki/SquashFS is "a compressed read-only file system for Linux" - which means it could be a really nice fit for Datasette and its read-only SQLite databases.

It would be interesting to explore a Dockerfile recipe that used SquashFS to compress the SQLite database file that was bundled up by datasette package and friends.

Jun 24 '18 18:06 simonw

Relevant: https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-system-for-self-contained-executables/

Jul 13 '18 18:07 simonw

See https://github.com/simonw/datasette/issues/657 and my changes that allow datasette to load parquet files

Feb 11 '20 14:02 dazzag24

On fly.io. This particular database goes from 1.4GB to 200M. Slower, part of that might be having no --inspect-file?

$ datasette publish fly ...   --generate-dir /tmp/deploy-this
...
$ mksquashfs large.db large.squashfs
$ rm large.db # don't accidentally put it in the image
$ cat Dockerfile
FROM python:3.8
COPY . /app
WORKDIR /app

ENV DATASETTE_SECRET 'xyzzy'
RUN pip install -U datasette
# RUN datasette inspect large.db --inspect-file inspect-data.json
ENV PORT 8080
EXPOSE 8080
CMD mount -o loop -t squashfs large.squashfs /mnt; datasette serve --host 0.0.0.0 -i /mnt/large.db --cors --port $PORT

It would also be possible to copy the file onto the ~6GB available on the ephemeral container filesystem on startup. A little against the spirit of the thing? On this example the whole docker image is 2.42 GB and the squashfs version is 1.14 GB.

Feb 17 '22 23:02 dholth

On second thought any kind of quick-to-decompress-on-startup could be helpful if we're paying for the container registry and deployment bandwidth but not ephemeral storage.

Feb 17 '22 23:02 dholth

datasette datasette copied to clipboard

Explore if SquashFS can be used to shrink size of packaged Docker containers

datasette
datasette copied to clipboard