tup icon indicating copy to clipboard operation
tup copied to clipboard

Tup database size can get very large

Open layus opened this issue 7 years ago • 2 comments

After building firefox with Tup, I discovered that the database takes >800MB on disk. This is about 6% of my git checkout + build outputs.

909M	gecko-dev/.tup       (tup internal state, including tup db)
2.2G	gecko-dev            (sources only)
4.9G	gecko-dev/.git       (git repo)
5.8G	gecko-dev/obj-tup    (build outputs)
14G	total

For those interested in more details, here is an sqlite3_analyzer run on it https://gist.github.com/layus/d4659d0341088d812efd77a3cdc14092

I am a bit worried that Tup performs so badly. It looks like there is some n² factor in the number of normal links, which is to be expected in a graph (where the number of edges is <= n², n being the number of nodes). This may however be something to be improved in the future.

layus avatar Aug 07 '18 13:08 layus

Are you finding that the database size has an impact on runtime performance? Tup doesn't need to load the whole db on incremental builds, so the full size may not be as much of a factor there.

As for reducing the size, we could potentially reduce the size of the node table by making the 'name' field an id that points into another table. For compilation commands with lots of -I flags, there are going to be to many duplicate ghost entries with the same filename, which could all share the same entry in a separate name_table. However as your analyzer info shows, the node table only accounts for <17% of the total size, so that's unlikely to have a significant effect on the overall usage.

Unfortunately I don't think there's an easy way for tup to reduce the size of the normal_link table, since those are the actual file-level dependencies of the subprocesses, and it is already a simple 'integer -> integer' relation.

The best thing we could do here for Firefox specifically is to reduce the usage of -I flags. This is something we had talked about in the past - essentially changing from the current use of EXPORTS + LOCAL_INCLUDES to just doing -Itopsrcdir -Itopobjdir, and using include "path/from/topXdir" in source files. Some compilation commands have upwards of 50+ include dirs, so if a header isn't found until the end of the -I chain, then we have potentially 50x the number of normal_links than would be required compared to a simpler set of -I flags.

gittup avatar Aug 07 '18 15:08 gittup

There are no particular worries here. I wanted to report a fact that may come as a surprise to other users. The database's size has the same order of magnitude as the sources themselves.

layus avatar Aug 07 '18 21:08 layus