tracker icon indicating copy to clipboard operation
tracker copied to clipboard

DIALOG_ERROR on first boot after some configuration

Open themitichris opened this issue 1 year ago • 14 comments

Installing Turnkey-Nextcloud after setting the nextcloud-admin password it shows this error. Tried resizing window, tried different consoles and also putty, same problem. Template: debian-12-turnkey-nextcloud_18.0-1_amd64.tar.gz image image

themitichris avatar Jan 29 '24 11:01 themitichris

Thanks for your bug report @themitichris.

That is very strange! The text of the error suggests to me that it it was trying to draw a dialog that was unsupported by the size of the terminal window it was being displayed in. But looking at the size of your screenshots, those windows appear to be plenty big enough to me?! Did you resize them to take the screenshots? Or was that the size they were from the start?

I'm assuming that the first screenshot is from a Proxmox NoVNC window?

I'll download the Nextcloud template and test myself and see if I can reproduce it.

As a complete aside, assuming that you are using Windows locally, FWIW you can install OpenSSH client on Windows fairly easily these days. Whilst PuTTY is functional and should work fine, you'll probably have a better experience using a native OpenSSH client.

JedMeister avatar Jan 31 '24 20:01 JedMeister

FWIW, I can reproduce it if I make the window really small. I.e.:

Screenshot from 2024-02-01 07-47-50

But if I resize it back to larger and hit enter (i.e. the "ok" option displayed) it refreshes and it renders ok again:

Screenshot from 2024-02-01 07-48-07

The strange thing is though that the next dialog after entering the Nextcloud admin password (i,.e. the dialog that you noted was the issue) is the domain setting page. And that doesn't have much text so I can make it tiny without causing me any issues?! (Although if I make it much smaller than that, the issue occurs, but that is expected - it's a limitation of the dialog interface).

Screenshot from 2024-02-01 07-56-45

FWIW, I normally use Firefox but I also tested in MS Edge just to be sure that it wasn't a browser specific issue (although I note you also tried PuTTY). Having said that my desktop runs Linux, so it may still be a Windows and/or browser quirk? That still doesn't explain why the same occurred in PuTTY.

Could you please try resizing the window and hitting enter. That should refresh the contents and if the window is large enough, it should "just work".

As an aside, I thought that we had implemented an error message for that issue?! There are limitations to what it can do, e.g. if the window is too small to display the short error message, then you'll still get a stacktrace, but it should be harder to make that happen. However I note that I wasn't getting the error message either?!

Any ideas @OnGle?

JedMeister avatar Jan 31 '24 21:01 JedMeister

Could you please try resizing the window and hitting enter. That should refresh the contents and if the window is large enough, it should "just work".

Tried, same problem, in all cases. I'm using chrome to manage Proxmox

themitichris avatar Jan 31 '24 21:01 themitichris

The first error indicates, it's trying to make a sub window of width=1, very odd.

@themitichris You said after setting password? Were you prompted for domain? If not I imagine the error is there, but the code for prompting domain looks ok...

The only other thing I can think is, assuming you can access your server's files (which might not be the case I guess), could you post your inithooks log? /var/log/inithooks.log That'll at least tell us where this error's occured.


Edit

I actually think the error occures here https://github.com/turnkeylinux-apps/nextcloud/blob/master/overlay/usr/lib/inithooks/bin/nextcloud.py#L64

        prefix = ''
        while True:
            password = d.get_password(
                "Nextcloud Password",
                prefix + "Enter new password for the Nextcloud 'admin' account.",
                pass_req=10)
            try:
                subprocess.run(
                         args = ['/usr/local/bin/turnkey-occ', 'user:resetpassword', '--password-from-env', 'admin'],
                         cwd='/var/www/nextcloud',
                         env={"OC_PASS": password},
                         text=True,
                         capture_output=True,
                         check=True)
            except subprocess.CalledProcessError as e:
                prefix = e.stderr + e.stdout + '\n'
            else:
                break

My guess is the output from turnkey-occ is pretty large, likely because some issue has occured inside turnkey-occ itself, and the sub-window might just be the actual input line, which is being pushed outside of the terminal window. Also it's not uncommon for terminal apps to use coordinates in y, x rather than x, y (because curses does it that way) which would mean it's height rather than width is 1, and that'd be consistent with the input line size.

If that's the case, then we should just use a separate window, like a pager or something to display the error.

OnGle avatar Jan 31 '24 23:01 OnGle

Thanks for the additional info @themitichris.

And I assume that you are using Proxmox? Or are you using LXC running some other way? (TBH it probably doesn't make any difference but the more info the better).

As @ongle noted, it'd be useful if you can share your inithooks logfile (/var/log/inithooks.log) too.

Also, if you're willing to share the password(s) you used that might be useful.

If we can reproduce the issue, then we should be able to fix it.

Re sharing info too, anything you'd rather not post publicly, please feel free to email support AT turnkeylinux.org.


Thanks too @OnGle.

As discussed privately, once you have your current task resolved, it'd be awesome if you can have a closer look at this.

Perhaps if we get a non zero exit code from turnkey-occ we need to explicitly display the result in a msgbox? (TBH I thought that was what it did already - but I'm likely missing something). I'm sure you're already aware but just in case, a msgbox should create a scrollable window (using pageup/pagedown) if it gets too much text - FYI here's the upstream docs (although note that is the docs for v3.3.5 and we have v3.5.1).

Having said all that, I tried using a "bad" password and it displayed the expected error message (i.e. no stacktrace). And with a good password it also worked as expected. I didn't try exhaustively though, so an edge case is quite possible.

JedMeister avatar Feb 01 '24 03:02 JedMeister

I've got the same issue, although it doesn't help if I resize the window to be bigger. Maybe my screen size isn't big enough or something. I don't know how the terminal size is handled. I am also running LXC on ProxMox and am trying to install nextcloud.

Louisbertelsmann avatar Feb 23 '24 14:02 Louisbertelsmann

@Louisbertelsmann - Thanks for adding your comment.

TL;DR if you're willing to share the password you tried that would likely allow me to reproduce this. And if I can reproduce it, then I'm certain that we can fix it. If you don't want to post publicly, then please email me (email in my GH profile).


More detailed response:

Whilst this can occur purely from having too small a window for the content displayed, the number of people who have now complained about this suggests that it is something specific to setting the Nextcloud password. FWIW Nextcloud is somewhat unique in that it has it's own fairly strict password validation mechanism (i.e. doesn't just have simple password requirements such as character types and length). Ultimately that is a good thing (as it makes "bad" passwords practically impossible) but I suspect that the password(s) you and others are causing a large amount of output that we aren't anticipating.

Unfortunately I've not been able to reproduce the issue, but I suspect that @ongle is onto something (in his comment above). As I hinted above (in the TL;DR) I haven't been able to reproduce it. But if you (and/or others) can share the password(s) that cause I should be able to. That will assist to diagnose the issue but also test that we've actually fixed it.

Even if we still can't reproduce it, we can make the improvements that @OnGle has suggested and make feedback and/or logging better. I'm pretty sure that will improve things, even if it doesn't solve them.

Also FWIW @OnGle has been caught up with another job and then has been unwell. As soon as he's back onboard, I'll get him to prioritize this as it's clearly causing pain to quite a few of you!

JedMeister avatar Feb 25 '24 00:02 JedMeister

@Louisbertelsmann - Also if you still have this container, would you be able to run:

journalctl -t inithooks

If you can, please double check before you post that it doesn't include anything sensitive (e.g. email, password, etc) and if so redact those bits. Although please clearly note redactions, so I'm aware which bits aren't verbatim. Thanks

JedMeister avatar Feb 25 '24 00:02 JedMeister

@JedMeister I used this password: aqp6HUTNfU It was generated for testing purposes and I specifically didn‘t use special characters so it hopefully wouldn‘t interfere with anything. I tried many different passwords and actually don‘t know why you can‘t reproduce this. Anyway, here‘s the log of inithooks.

Feb 26 08:59:55 nextcloud-test inithooks[786]: [01ipconfig] running
Feb 26 08:59:55 nextcloud-test inithooks[788]: [01ipconfig] successfully completed
Feb 26 08:59:55 nextcloud-test inithooks[790]: [05autogrow-fs] running
Feb 26 08:59:55 nextcloud-test inithooks[795]: [05autogrow-fs] successfully completed
Feb 26 08:59:55 nextcloud-test inithooks[798]: [09hostname] running
Feb 26 08:59:55 nextcloud-test inithooks[813]: [09hostname] successfully completed
Feb 26 08:59:55 nextcloud-test inithooks[817]: [10randomize-cronapt] running
Feb 26 08:59:55 nextcloud-test inithooks[823]: [10randomize-cronapt] successfully completed
Feb 26 08:59:55 nextcloud-test inithooks[827]: [10randomize-crontab] running
Feb 26 08:59:55 nextcloud-test inithooks[838]: [10randomize-crontab] successfully completed
Feb 26 08:59:55 nextcloud-test inithooks[840]: [10regen-sshkeys] running
Feb 26 08:59:55 nextcloud-test inithooks[855]: [10regen-sshkeys] successfully completed
Feb 26 08:59:55 nextcloud-test inithooks[857]: [15regen-sslcert] running
Feb 26 08:59:58 nextcloud-test inithooks[2646]: [15regen-sslcert] successfully completed
Feb 26 08:59:58 nextcloud-test inithooks[2648]: [20regen-nextcloud-secrets] running
Feb 26 08:59:59 nextcloud-test inithooks[2669]: [20regen-nextcloud-secrets] successfully completed
Feb 26 08:59:59 nextcloud-test inithooks[2671]: [29preseed] running
Feb 26 08:59:59 nextcloud-test inithooks[2678]: [29preseed] successfully completed
Feb 26 08:59:59 nextcloud-test inithooks[2680]: [29sudoadmin] running
Feb 26 08:59:59 nextcloud-test inithooks[2736]: [29sudoadmin] successfully completed
Feb 26 08:59:59 nextcloud-test inithooks[2738]: [29tagid] running
Feb 26 08:59:59 nextcloud-test inithooks[2760]: [29tagid] successfully completed
Feb 26 08:59:59 nextcloud-test inithooks[2762]: [30rootpass] skipping
Feb 26 08:59:59 nextcloud-test inithooks[2764]: [30turnkey-init-fence] running
Feb 26 09:00:00 nextcloud-test inithooks[2858]: [30turnkey-init-fence] successfully completed
Feb 26 09:00:00 nextcloud-test inithooks[2860]: [35adminer-mysqlpass] running
Feb 26 09:00:00 nextcloud-test inithooks[2870]: [35adminer-mysqlpass] successfully completed
Feb 26 09:00:00 nextcloud-test inithooks[2872]: [35postfix-unprivileged] running
Feb 26 09:00:02 nextcloud-test inithooks[3212]: [35postfix-unprivileged] successfully completed
Feb 26 09:00:02 nextcloud-test inithooks[3214]: [40nextcloud] running
Feb 26 09:00:02 nextcloud-test inithooks[3222]: [40nextcloud] successfully completed
Feb 26 09:00:02 nextcloud-test inithooks[3224]: [80hub-services] running
Feb 26 09:00:02 nextcloud-test inithooks[3226]: [80hub-services] successfully completed
Feb 26 09:00:02 nextcloud-test inithooks[3228]: [85secalerts] running
Feb 26 09:00:02 nextcloud-test inithooks[3230]: [85secalerts] successfully completed
Feb 26 09:00:02 nextcloud-test inithooks[3232]: [92etckeeper] running
Feb 26 09:00:03 nextcloud-test inithooks[3464]: [92etckeeper] successfully completed
Feb 26 09:00:03 nextcloud-test inithooks[3466]: [95secupdates] running
Feb 26 09:00:03 nextcloud-test inithooks[3469]: [95secupdates] successfully completed
Feb 26 09:00:03 nextcloud-test inithooks[3471]: [97turnkey-init-fence-disable] running
Feb 26 09:00:03 nextcloud-test inithooks[3473]: [97turnkey-init-fence-disable] successfully completed
Feb 26 09:00:03 nextcloud-test inithooks[3475]: [98finalize] running
Feb 26 09:00:03 nextcloud-test inithooks[3478]: [98finalize] successfully completed
Feb 26 09:00:03 nextcloud-test inithooks[3480]: [99reboot] skipping
Feb 26 09:00:03 nextcloud-test inithooks[3485]: [01empty] skipping
Feb 26 09:00:03 nextcloud-test inithooks[3486]: Inithook run completed, exiting.
Feb 26 09:01:04 nextcloud-test inithooks[3519]: [01empty] skipping
Feb 26 09:01:04 nextcloud-test inithooks[3520]: Inithook run completed, exiting.
Feb 26 09:02:04 nextcloud-test inithooks[3552]: [01empty] skipping
Feb 26 09:02:04 nextcloud-test inithooks[3553]: Inithook run completed, exiting.

Louisbertelsmann avatar Feb 26 '24 09:02 Louisbertelsmann

Thanks @Louisbertelsmann - very much appreciated!

Given that password, I too am super surprised I can't reproduce!

I'm also surprised to see:

Feb 26 09:00:02 nextcloud-test inithooks[3214]: [40nextcloud] running
Feb 26 09:00:02 nextcloud-test inithooks[3222]: [40nextcloud] successfully completed

Given what has been shared (by yourself and others) - I'm almost certain that it's [40nextcloud] (/usr/lib/inithooks/firstboot.d/40nextcloud which calls /usr/lib/inithooks/bin/nextcloud.py) that is throwing the stacktrace!

I suspect that it's unrelated but the last bit is weird too:

Feb 26 09:00:03 nextcloud-test inithooks[3485]: [01empty] skipping
Feb 26 09:00:03 nextcloud-test inithooks[3486]: Inithook run completed, exiting.
Feb 26 09:01:04 nextcloud-test inithooks[3519]: [01empty] skipping
Feb 26 09:01:04 nextcloud-test inithooks[3520]: Inithook run completed, exiting.
Feb 26 09:02:04 nextcloud-test inithooks[3552]: [01empty] skipping
Feb 26 09:02:04 nextcloud-test inithooks[3553]: Inithook run completed, exiting.

FWIW we now have automated smoke testing and that showed no sign of any problems either!? It is done within a Docker container, but that should have no real relevance here (should be very close to LXC env for our purposes).

During the smoke tests we use the password TurnKey12? (see https://github.com/turnkeylinux/tkldev-docker/blob/master/inithooks.conf).

JedMeister avatar Feb 26 '24 22:02 JedMeister

Maybe the docker image uses some different package versions?

Louisbertelsmann avatar Feb 27 '24 05:02 Louisbertelsmann

I don't think that's possible. The build process goes like this: First we build an ISO. Then we dump the rootfs from the ISO into a Docker container for smoke testing (all performed by bt-iso). Then to do the proxmox/lxc builds, we extract the rootfs from that same ISO into an tarball for LXC (using bt-container).

I'm not sure if it's a factor, but I've been running proxmox for a very long time and have always just upgraded it. I think the last time I did a clean install on my main server was v3.x. Although I do have a new node that is running v8.x (installed as v7.x) and I haven't actually tested it on that. So that may be a factor? I'd be surprised TBH, but perhaps?

Regardless, I'll download a fresh copy on my newer node and try again to reproduce. I'll keep you posted.

JedMeister avatar Feb 27 '24 09:02 JedMeister

That could be a problem. I think the LXC versions are different, and maybe the new one has a problem with the window size.

Louisbertelsmann avatar Feb 27 '24 15:02 Louisbertelsmann

Are there any news on this? Can I help you fix this?

Louisbertelsmann avatar Mar 12 '24 08:03 Louisbertelsmann

Ok, I've finally worked this one out.

It occurs when Nextcloud throws an exception when trying to set the password and the Nextcloud stacktrace is to big for the firstboot dialog to display.

From my testing, the exception is most likely caused by Redis not running, although can also be caused by MariaDB not running. When you try to set a password and either (or both) of these services are not running, Nextcloud will error.

I have refactored the Nextcloud firstboot script to capture the stacktrace and write it to the log, rather than trying to display it. It also now gives specific feedback when that happens and allow the user to skip setting a password, so the firstboot can complete. The log will include all info related to errors.

The PR can be seen here: https://github.com/turnkeylinux-apps/nextcloud/pull/28

That still won't actually solve the root issue of services not running, but will at least solve this issue. And it should be a step in the right direction.

After having a look at the default redis-server.service file, my guess is that Redis is failing in a container because of some service hardening which is not compatible with unprivileged containers. I have tweaked our buildscripts to override some of those that I'm pretty sure are causing issues, although it may need more. See that here: https://github.com/turnkeylinux/buildtasks/pull/87

Once all these changes have been merged I will rebuild Nextcloud.

JedMeister avatar Jul 22 '24 07:07 JedMeister