nix icon indicating copy to clipboard operation
nix copied to clipboard

Rebuild sqlite db from scratch?

Open bhipple opened this issue 6 years ago • 29 comments
trafficstars

If nix's sqlite db cache is lost/corrupted/destroyed in some way, is there a way to perform the O(N) operation to rescan the store on disk and rebuild the sqlite db from scratch?

It looks like nix-store --init will initialize a new empty db that appears operational, but this db has no knowledge of what's in the nix store on the host and will try to re-download things when a command like nix copy is run.

The nix-store --verify --check-contents --repair option looks like what I want, but it iterates over the store paths in the sqlite db, checking that the filesystem paths correspond to what's in the db; I'd like the reverse, since my sqlite db is empty.

The command nix-store --add /nix/store/* will go and claim to add all the paths to the nix store db, and it does get bigger, but it still appears to fail when I run a command like nix-store --verify on the paths later.

Is there a way to do this using today's commands?

bhipple avatar Sep 16 '19 00:09 bhipple

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/restore-a-backup-of-nix-store/6296/5

nixos-discourse avatar Mar 18 '20 16:03 nixos-discourse

I am interesting in this since my sqlite seems to be corrupted

nix-store --verify --repair --check-contents
...
1593 store paths deleted, 1362.63 MiB freed
error: executing SQLite statement 'delete from ValidPaths where path = ?;': constraint failed (in '/nix/var/nix/db/db.sqlite')

fzakaria avatar Aug 13 '20 03:08 fzakaria

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/zfs-dedup-on-nix-store-is-it-worth-it/4959/7

nixos-discourse avatar Aug 19 '20 18:08 nixos-discourse

There is no correct way to do this, since without the database, we can't distinguish between valid and invalid paths (e.g. partial build outputs).

edolstra avatar Aug 19 '20 19:08 edolstra

I marked this as stale due to inactivity. → More info

stale[bot] avatar Feb 15 '21 19:02 stale[bot]

If I know that there are no invalid paths (or don't care about them), is it be possible to rebuild it from the current state anyways?

I'd much rather delete a few broken paths here or there than wipe clean my whole Nix store to get store optimisation to work again.

Atemu avatar Apr 02 '21 21:04 Atemu

Is there any interest in this issue? The capabilitity seems really, really valuable when you need it, and easy to ignore if you don't run into corruption or if you have infinite bandwidth.

What does it take to implement this? How feasible/difficult is the task?

ketzacoatl avatar Apr 10 '21 14:04 ketzacoatl

Ability to recreate sqlite stuff if it damaged would be very nice, Typical use case -- you have build-able configuration.nix and working cache nearby, so it should be possible to build/switch "damaged" host. Second one -- if we have sibling host with similar /store content, so should be possible check/re-download from cache or sibling host.

avnik avatar Jun 18 '21 11:06 avnik

I marked this as stale due to inactivity. → More info

stale[bot] avatar Jan 03 '22 21:01 stale[bot]

Not stale, please keep open.

ketzacoatl avatar Jan 04 '22 00:01 ketzacoatl

error: executing SQLite statement 'delete from ValidPaths where path = ?;': constraint failed (in '/nix/var/nix/db/db.sqlite')

Ran into this recently. Is there a solution to a DB error like this?

pinpox avatar Jan 17 '22 10:01 pinpox

The only solution to db errors that isn't wiping the Nix store is to manually edit the database and remove bad entries.

Atemu avatar Jan 17 '22 14:01 Atemu

Hm ok. I ran the --verify --repair multiple times after collecting garbage and the error seems to have been fixed. I assume it deleted the bad entries then. Some way of recreating or checking the db would be nice though

pinpox avatar Jan 17 '22 14:01 pinpox

I could fix my database by running

sqlite ".dump" /nix/var/nix/db./db/db.sqlite > db.sql

Then i changed the line ROLLBACK; to COMMIT; moved dp.sqlite to a backup and run

sqlite /nix/var/nix/db./db/db.sqlite < db.sql

If i really lost my db.sqlite how would i get my nix working again on NixOS. Deleting the nix store would render my system unusable. Could I do this from a live nixos boot stick and then rebuild the system?

StillerHarpo avatar May 21 '22 06:05 StillerHarpo

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-store-sqlite-db-corruption/22841/1

nixos-discourse avatar Oct 30 '22 01:10 nixos-discourse

I could fix my database by running

sqlite ".dump" /nix/var/nix/db./db/db.sqlite > db.sql

Then i changed the line ROLLBACK; to COMMIT; moved dp.sqlite to a backup and run

sqlite /nix/var/nix/db./db/db.sqlite < db.sql

If i really lost my db.sqlite how would i get my nix working again on NixOS. Deleting the nix store would render my system unusable. Could I do this from a live nixos boot stick and then rebuild the system?

Thank you so much for your solution, this saved my NixOS systems at least twice. For some reasons, when reaching 100% of disk space utilization, things start to get corrupted :sweat:

rapenne-s avatar Dec 28 '22 09:12 rapenne-s

@rapenne-s was this on btrfs?

Atemu avatar Dec 28 '22 13:12 Atemu

Yes, I did run nix-collect-garbage with the filesystem full, so I suppose changes were made to the sqlite file before actually freeing space, and it got corrupted.

rapenne-s avatar Dec 28 '22 14:12 rapenne-s

Something similar happen to me on ZFS just now, when a power outage reset my desktop.

[nix-shell:~]$ sudo sqlite3 /nix/var/nix/db/db.sqlite.bak 'pragma integrity_check'
row 34013 missing from index sqlite_autoindex_ValidPaths_1

nix-store --verify --check-contents --repair was also throwing errors till I manually fixed the DB. Maybe something could be done to make nix handle DB corruptions better.

illustris avatar Jan 04 '23 01:01 illustris

error: executing SQLite statement 'delete from ValidPaths where path = ?;': constraint failed (in '/nix/var/nix/db/db.sqlite')

Ran into this recently. Is there a solution to a DB error like this?

I have this error when I clean my store (that forbid me to clean my store further)

So I have this script:

#!/usr/bin/env nix-shell
#! nix-shell -p sqlite -i bash

set -euo pipefail

until nix-collect-garbage -d {fd}>&1 1>&2 2>&${fd} | sed -n "s|.*nt '\(.*\)':.*|\1|gp" | sqlite3 /nix/var/nix/db/db.sqlite
do
 :
done

nt in my regex is for statement (I could be more precise if you want)

You should be able to run any command instead of nix-collect-garbage -d (maybe the same with a very old date so you don't actually remove file).

It will stop when the message doesn't appear. This script working on Darwin without GNU extension. I have one a bit easier but it execute command with sed 🤯.

Et7f3 avatar Feb 21 '23 23:02 Et7f3

I can understand Nix not wanting to provide "recreate the DB from existing store" as a first class feature if it doesn't know that it can trust the paths that it finds on the filesystem to be complete and correct. At the same time, the schema for the DB is super simple and doesn't change often:

https://github.com/NixOS/nix/blob/master/src/libstore/schema.sql

So it seems quite conceivable that a trivial external tool could be written to take care of this task, or it could be an optional integration with stuff like dockerTools to make it possible to use Nix itself inside the resulting image.

mikepurvis avatar Mar 23 '23 20:03 mikepurvis

@mikepurvis As far as I can tell, the Refs table of the database cannot be recreated when it comes to runtime dependencies, which (unfortunately) prevents such a tool being possible. (I would love to be proven wrong, so I welcome counter info.)

As I understand it, the references stored in the Refs table for a .drv file are build-time dependencies, but for a directory(?)/output(?), they are runtime dependencies. Also, .drv files store build dependencies/references, but not runtime dependencies/references. Thus, there is no metadata from which those runtime dependencies could be derived.

adamsmd avatar Oct 08 '23 17:10 adamsmd

This isn't exactly what OP is asking, but in case it's some minor corruption you might want to try the following. I experienced a disk I/O error with db.sqlite, so I first run ddrescue to skip the problematic bytes.

sudo su -
cd /nix/var/nix/db
nix-shell -p sqlite ddrescue
systemctl stop nix-daemon.socket nix-daemon.service
cp db.sqlite /tmp/ # a backup doesn't hurt, but might throw errors depending on your filesystem and the specific issue with db.sqlite
mv db.sqlite db.sqlite.broken
ddrescue db.sqlite.broken db.sqlite
sqlite3 db.sqlite
.recover
.quit
nix-store --verify --check-contents # -> didn't report any issues
rm db.sqlite.broken /tmp/db.sqlite

wpiekutowski avatar Nov 03 '23 13:11 wpiekutowski

I've somehow ended up with a very corrupt nix store on a remote system (which I can't easily reboot into a rescue image), discovered when trying to nixos-rebuild build.

Every variation of nix-store or nix store with various combinations of --repair, --repair-path, or --verify fails.

nix build .nixosConfigurations... also fails. Tried remounting the root to /mnt and running nixos-install but this also fails.

I keep seeing nixos-rebuild --repair recommended, but it doesn't look like that command exists anymore.

No luck with the above sqlite commands; the integrity_check shows ok.

Also on BTRFS, plenty of space, both a balance and a scrub show no errors.

Was hoping that I could just start from scratch and redownload everything with nix-store --init, but that apparently doesn't do what I thought it would do:

# mv /nix/var/nix/db/db.sqlite{,.bak}
# nix-store --init
error: cannot open SQLite database '/nix/var/nix/db/db.sqlite': unable to open database file
# touch /nix/var/nix/db/db.sqlite
# nix-store --init
error: creating statement 'insert into ValidPaths (path, hash, registrationTime, deriver, narSize, ultimate, sigs, ca)
 values (?, ?, ?, ?, ?, ?, ?, ?);': SQL logic error, no such table: ValidPaths (in '/nix/var/nix/db/db.sqlite')
# sqlite3 /nix/var/nix/db/db.sqlite < <(curl 'https://raw.githubusercontent.com/NixOS/nix/master/src/libstore/schema.sql')
# nix-store --init
error: creating statement '
                       insert into Realisations (drvPath, outputName, outputPath, signatures)
                       values (?, ?, (select id from ValidPaths where path = ?), ?)
                       ;
                   ': SQL logic error, no such table: Realisations (in '/nix/var/nix/db/db.sqlite')

Trying a more manual approach after backing up and dumping the database per instructions above (probably should not have used parallel):

$ sudo systemctl stop nix-daemon
$ checkDrv() { [ -s "$1" ] || echo "INSERT INTO ValidPath.*$1"; }
$ export -f checkDrv
$ awk -F"'" '/^INSERT INTO ValidPath/ && $2 ~ /.*\.drv/ { print $2 }' db.sqlite.dumped |
    parallel checkDrv |
    tee broken.txt
$ grep -v -f broken.txt db.sqlite.dumped > db.sqlite.dumped.fixed
$ wc -l db.sqlite.dumped*
  408903 db.sqlite.dumped
  408847 db.sqlite.dumped.fixed
  817750 total
$ sudo mv /nix/var/nix/db/db.sqlite{,.bak2}
$ sudo sqlite3 /nix/var/nix/db/db.sqlite < db.sqlite.dumped.fixed
$ sudo systemctl restart nix-daemon

Unsurprisingly, that also didn't work:

$ nixos-rebuild build
building the system configuration...
error: opening file '/nix/store/7h2127933b63m4ldv115f0q5g982alfs-bc-1.07.1.drv': No such file or directory
$ sudo rg -u --files-with-matches 7h2127933b63m4ldv115f0q5g982alfs /nix/var/nix/db/*
/nix/var/nix/db/db.sqlite.bak

Is there no way to start a new store from scratch (or otherwise fix a broken db.sqlite) without the ability to reboot from installation media?

n8henrie avatar Dec 11 '23 20:12 n8henrie

Since you're on btrfs, if you use a separate subvol mount for /nix, you could try to nixos-install (might need --system) your current system into a new nix subvolume and then swap out the old broken subvol with the new one.

Atemu avatar Dec 12 '23 01:12 Atemu

Well the nixos-install seemed to work at least, but when I swapped out the subvols and rebooted something went wrong and I lost all (remote / ssh) access to the system. Oh well, not a mission-critical machine, I'll report back in February when I'm next able to physically access it. Thanks for the suggestion!

n8henrie avatar Dec 12 '23 15:12 n8henrie

The sqlite db being a source of truth for state, and not merely a performance optimization—and thus not re-creatable from other (meta)data—and that db not being reliably updated by default and it not being stored on a read-only filesystem (unlike /nix/store) is something worth fixing, I think.

The combination of these factors means that an immutable atomic-upgrade system can still end up in a state that's unrecoverable without drastic measures like a complete reinstall. The fact that nix store --init seems to be a no-op doesn't help. Would be great if this command created a valid but empty DB. Then you could at least rebuild what you really need and after that do a garbage collection.

@mikepurvis As far as I can tell, the Refs table of the database cannot be recreated when it comes to runtime dependencies, which (unfortunately) prevents such a tool being possible. (I would love to be proven wrong, so I welcome counter info.)

As I understand it, the references stored in the Refs table for a .drv file are build-time dependencies, but for a directory(?)/output(?), they are runtime dependencies. Also, .drv files store build dependencies/references, but not runtime dependencies/references. Thus, there is no metadata from which those runtime dependencies could be derived.

This is a design problem, not an inherent one.

Given that one of the explicit selling points of Nix and NixOS is reliability, recoverability, the db being both a source of truth about state and also not treated with the same care as what's actually in /nix/store, I really think the design of the nix store sqlite db, and how its used could use some love. :pray:

Anyway, that's my 2c. Nix is amazing and thanks for all the hard work everyone does, I know no software is perfect.

If anyone is interested in the context: in my particular case I think my nix db got corrupted by a hibernation (and failed resume, thanks Nvidia) during a nixos-rebuild. This corruption was minor in that it was only an issue when trying to garbage collect, but otherwise allowed Nix to continue to function without any apparent difficulty. My system is set to do GC automatically, so I didn't get to see the error messages. One day I did and thought I'd managed to fix it. I had not, the issue came back, again unbeknownst to me. Eventually, cue a full EFI partition, and thus a failed nixos-rebuild activation. This was many, many generations later; spent 8 hours trying to manually fix the db, and seemed no closer to being done, eventually decided to just reinstall.

zeorin avatar Feb 07 '24 09:02 zeorin

I'll report back in February when I'm next able to physically access it

As promised...

After a power cycle the device came up normally with the new /nix subvolume mounted and seeming to function just fine. Currently trying a rebuild (no errors yet), and afterwards will scrub, if no errors I plan to delete the old /nix subvolume and consider myself lucky.

Looks like it was not a boot failure, journald logs intact from Dec 12 to today, probably just some kind of transient networking error preventing remote connection I would assume. Currently seems fixed on wlan and plugged in an ethernet cable to hopefully prevent going forward.

n8henrie avatar Feb 09 '24 00:02 n8henrie

Forgive me but isn't a format like sqlite supposed to prevent corruption in the face of power loss? Are there certain #pragmas to enable / disable for increased consistency of the db?

hraban avatar Feb 25 '24 05:02 hraban

If I understand the nix source code correctly, the default pragmas for its use of sqlite (on non-WSL systems) are:

pragma synchronous = normal
pragma main.journal_mode = wal

Relevant docs for synchronous:

NORMAL (1) When synchronous is NORMAL (1), the SQLite database engine will still sync at the most critical moments, but less often than in FULL mode. There is a very small (though non-zero) chance that a power failure at just the wrong time could corrupt the database in journal_mode=DELETE on an older filesystem. WAL mode is safe from corruption with synchronous=NORMAL, and probably DELETE mode is safe too on modern filesystems. WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash. Transactions are durable across application crashes regardless of the synchronous setting or journal mode. The synchronous=NORMAL setting is a good choice for most applications running in WAL mode.

— https://www.sqlite.org/pragma.html#pragma_synchronous

And for WAL:

Write transactions are very fast since they only involve writing the content once (versus twice for rollback-journal transactions) and because the writes are all sequential. Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot. (Writers sync the WAL on every transaction commit if PRAGMA synchronous is set to FULL but omit this sync if PRAGMA synchronous is set to NORMAL.)

— https://www.sqlite.org/wal.html

This suggests by default, assuming the FS has no consistency bugs, the db shouldn't get corrupted. "In theory." :/

Which is of course an unsatisfying answer.

hraban avatar Mar 01 '24 01:03 hraban