dqlite requires a large stack
Attempting to bootstrap Juju controller on arm64 with the latest version (1.16.7) of dqlite causes a segfault. This uses musl to statically compile the jujud-controller. The snapcraft.yaml contains all the dependencies for building the binary.
The CI run in question: https://github.com/juju/juju/actions/runs/10212413197/job/28255682924?pr=17836
Repro steps...
This has to be on aarch64 (arm64). I used multipass on a mac M1, but it could easily be done on an aws graviton machine.
Prerequisites:
- git
- build-essentials (make, etc).
- lxd (you might need to correctly run
sudo lxd init --auto) - snapcraft (grab from snap)
$ git checkout https://github.com/juju/juju/pull/17836
$ snapcraft --use-lxd
$ sudo snap install *.snap --dangerous
$ sudo snap connect juju:lxd lxd
$ sudo snap connect juju:config-lxd
$ sudo snap connect juju:dot-local-share-juju
$ sudo snap connect juju:ssh-keys
$ juju bootstrap lxd test --keep-broken
The segfault should happen.
$ lxc exec <juju container name> -- bash
$ apt install gdb
$ LIBDQLITE_TRACE=1 gdb /var/lib/juju/tools/3.6-beta2.1-ubuntu-arm64/jujud bootstrap-state --timeout 20m0s --data-dir '/var/lib/juju' --debug '/var/lib/juju/bootstrap-params'
The backtrace for the segfault: https://paste.ubuntu.com/p/FcY8WZ9GT3/
A follow-up to this can be found here.
TLDR: dqlite needs more than 128k of stack memory, musl's default.
I'd be happy to close this, with an advisory that if using musl you need to increase the default stack size in some docs or README.md.
@SimonRichardson I've updated the README in #700. I was planning to keep this issue to open as a reminder to myself to investigate why dqlite needs a large stack---there might be another issue like the one with EXEC_SQL lurking.
Can we re-evaluate this? I think that we've come a long way since this was open and I think that it is not true anymore that we require a large stack.