ostree
ostree copied to clipboard
`ostree admin` fails if `/ostree/deploy` has mtime 0
Hi everyone, I have an issue that seems hard to describe, I'll do my best to explain what I did:
I'm building a debian-based OSTree system. I'm using the ostree package from debian testing.
My system does boot successfully and it looks good on the first sight.
I can run ostree
commands like:
root@e931861b90bf:~# ostree refs
gardenlinux/today/arm64
ostree/1/1/0
root@e931861b90bf:~# ostree log gardenlinux/today/arm64
commit 54d0898411d9a709685bc1ff6b3b4fafbdc16402c8c4e014ad84488b7e39fb59
ContentChecksum: 9a7c454a3150e9d42e9ccf1743c5b51c4e588799423e13235d4c95882f4bc993
Date: 2023-08-28 14:42:53 +0000
Gardenlinux build Mon Aug 28 14:42:50 UTC 2023
root@e931861b90bf:~# ostree ls 54d0898411d9a709685bc1ff6b3b4fafbdc16402c8c4e014ad84488b7e39fb59
d00755 0 0 0 /
l00777 0 0 0 /bin -> usr/bin
l00777 0 0 0 /home -> var/home
l00777 0 0 0 /lib -> usr/lib
l00777 0 0 0 /ostree -> sysroot/ostree
l00777 0 0 0 /root -> var/roothome
l00777 0 0 0 /sbin -> usr/sbin
... (output abbreviated)
But as soon as I'm trying to run any of the ostree admin
subcommands, I get an error because of a failed assertion:
root@e931861b90bf:~# ostree admin status
**
OSTree:ERROR:src/libostree/ostree-sysroot.c:1333:ostree_sysroot_get_deployments: assertion failed: (self->loadstate == OSTREE_SYSROOT_LOAD_STATE_LOADED)
Bail out! OSTree:ERROR:src/libostree/ostree-sysroot.c:1333:ostree_sysroot_get_deployments: assertion failed: (self->loadstate == OSTREE_SYSROOT_LOAD_STATE_LOADED)
Aborted
This does seem to happen with any sub-command of ostree admin
.
root@e931861b90bf:~# ostree admin cleanup
**
OSTree:ERROR:src/libostree/ostree-sysroot.c:1284:ostree_sysroot_get_booted_deployment: assertion failed: (self->loadstate == OSTREE_SYSROOT_LOAD_STATE_LOADED)
Bail out! OSTree:ERROR:src/libostree/ostree-sysroot.c:1284:ostree_sysroot_get_booted_deployment: assertion failed: (self->loadstate == OSTREE_SYSROOT_LOAD_STATE_LOADED)
Aborted
It looks like loadstate
is only assigned the OSTREE_SYSROOT_LOAD_STATE_LOADED
value in a single location in ostree-sysroot.c
I'm not sure where this is supposed to be called. I've tried to trace the function calls, and the most likely chain seems to be this:
ot_admin_instutil_builtin_set_kargs
ostree_sysroot_deployment_set_kargs
ostree_sysroot_write_deployments
ostree_sysroot_write_deployments_with_options
write_deployments_finish
ostree_sysroot_stage_tree_with_options
ostree_sysroot_load
ostree_sysroot_load_if_changed
sysroot_load_from_bootloader_configs
self->loadstate = OSTREE_SYSROOT_LOAD_STATE_LOADED
I would expect this to happen at image build time when I call ostree admin deploy
, but it does not look like this.
Verbose output of ostree admin deploy
does not show any hint of an issue, here is an abbreviated output of that:
OT: Keeping needed object cc8a8101ce21826e2e05412343c2f5a47390d7af242efed38857766dc8d207d7.file
OT: Pushing lock non-blocking with timeout 30
OT: Push lock: state=exclusive, depth=2
OT: Repo already locked exclusive, maintaining state
OT: Popping lock non-blocking with timeout 30
OT: Pop lock: state=exclusive, depth=3
OT: Maintaining lock state as exclusive
OT: Popping lock non-blocking with timeout 30
OT: Pop lock: state=exclusive, depth=2
OT: Maintaining lock state as exclusive
OT: Popping lock non-blocking with timeout 30
OT: Pop lock: state=exclusive, depth=1
OT: Unlocking repo
Can someone help me figure out why this assertion is failing? At what point is loadstate
supposed to be set to OSTREE_SYSROOT_LOAD_STATE_LOADED
?
Maybe relevant to mention: It's booting using systemd-boot (despite this not being officially supported, as documented here). This might be an issue when loading the repository, I guess.
I've made some progress in debugging this:
inside ostree_sysroot_load_if_changed
there is an early exit based on timestamps: self->loaded_ts.tv_sec == stbuf.st_mtim.tv_sec && self->loaded_ts.tv_nsec == stbuf.st_mtim.tv_nsec
. This condition seems to be true in my case, when I rebuild the package with this block disabled, I can run any ostree admin
command. I don't know why this is true in my case when it should not be true.
What does stat /ostree/deploy
say? I'm not sure if that's what's happening here, but regardless we should probably tweak this logic so it can handle the case where the timestamp of /ostree/deploy
is epoch. (Likely from being normalized at build time or something.)
Hi @jlebon, oh, yeah, has not occurred to me.
root@garden:~# stat /ostree/deploy
File: /ostree/deploy
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 254,2 Inode: 28 Links: 3
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 1970-01-01 00:00:00.000000000 +0000
Modify: 1970-01-01 00:00:00.000000000 +0000
Change: 1970-01-01 00:00:00.000000000 +0000
Birth: 2023-09-19 14:33:31.000000000 +0000
I think this is because in Garden Linux we want images to be reproducible which is why the timestamp is set to 0 on purpose when creating the filesystems.. Probably that's why the check is true..
Okay, I've verified that this is the issue. I've learned that our tooling for making filesystems can set an arbitrary timestamp, and if I set it to a non-zero timestamp the issue does not occur.
This is the commit that mitigates the issue
The commit before that allows to reproduce the issue, if that is relevant to anyone.
That's good enough for my poc use-case, but maybe we should still leave this issue open if it makes sense to make this condition more robust for such cases?
That's good enough for my poc use-case, but maybe we should still leave this issue open if it makes sense to make this condition more robust for such cases?
It makes sense to me. I've retitled the issue.