dqlite icon indicating copy to clipboard operation
dqlite copied to clipboard

dqlite requires a large stack

Open SimonRichardson opened this issue 1 year ago • 4 comments

Attempting to bootstrap Juju controller on arm64 with the latest version (1.16.7) of dqlite causes a segfault. This uses musl to statically compile the jujud-controller. The snapcraft.yaml contains all the dependencies for building the binary.

The CI run in question: https://github.com/juju/juju/actions/runs/10212413197/job/28255682924?pr=17836

Repro steps...

This has to be on aarch64 (arm64). I used multipass on a mac M1, but it could easily be done on an aws graviton machine.

Prerequisites:

  1. git
  2. build-essentials (make, etc).
  3. lxd (you might need to correctly run sudo lxd init --auto)
  4. snapcraft (grab from snap)
$ git checkout https://github.com/juju/juju/pull/17836
$ snapcraft --use-lxd
$ sudo snap install *.snap --dangerous
$ sudo snap connect juju:lxd lxd
$ sudo snap connect juju:config-lxd
$ sudo snap connect juju:dot-local-share-juju
$ sudo snap connect juju:ssh-keys
$ juju bootstrap lxd test --keep-broken

The segfault should happen.

$ lxc exec <juju container name> -- bash
$ apt install gdb
$ LIBDQLITE_TRACE=1 gdb /var/lib/juju/tools/3.6-beta2.1-ubuntu-arm64/jujud bootstrap-state --timeout 20m0s --data-dir '/var/lib/juju' --debug '/var/lib/juju/bootstrap-params'

The backtrace for the segfault: https://paste.ubuntu.com/p/FcY8WZ9GT3/

SimonRichardson avatar Aug 07 '24 08:08 SimonRichardson

A follow-up to this can be found here.

TLDR: dqlite needs more than 128k of stack memory, musl's default.

hpidcock avatar Aug 08 '24 07:08 hpidcock

I'd be happy to close this, with an advisory that if using musl you need to increase the default stack size in some docs or README.md.

SimonRichardson avatar Sep 03 '24 07:09 SimonRichardson

@SimonRichardson I've updated the README in #700. I was planning to keep this issue to open as a reminder to myself to investigate why dqlite needs a large stack---there might be another issue like the one with EXEC_SQL lurking.

cole-miller avatar Sep 03 '24 15:09 cole-miller

Can we re-evaluate this? I think that we've come a long way since this was open and I think that it is not true anymore that we require a large stack.

marco6 avatar Oct 08 '25 11:10 marco6