initrd icon indicating copy to clipboard operation
initrd copied to clipboard

Reboot broken on ARM64 virtual servers

Open anisse opened this issue 6 years ago • 19 comments

Using software reboot on arm64 VPS causes this message to be repeated indefinitely on the serial console:

IRQ Exception at 0x00000000BBC11F38


IRQ Exception at 0x00000000BBC11F38

The only way to restart the server is then to use the API (e.g scw restart) or web interface.

Please tell me if this issue should be moved to another repo.

anisse avatar Dec 20 '18 21:12 anisse

It's BIOS issue, this is a known issue and their team is working on it... but no ETAs, I have this issue a long time ago.

minhng99 avatar Dec 31 '18 12:12 minhng99

Same here, instance with IPv4 + IPv6, successfully boot only on every second "reboot":

Infinite loop with error message

IRQ Exception at 0x00000000BBC11F38

Another problem, connected with this bug: after instance was created, web-interface shows me that system is available, but it doesn't, because of this bug.

vazhnov avatar Mar 05 '19 08:03 vazhnov

It looks like problem is only for Scaleway's image Ubuntu Bionic Beaver for ARM64. I tried to use some Debian images (sid and Stretch), they don't have such problem, nor in original state, nor after full upgrade. Also I tried to use Ubuntu Xenial image and I had only one freeze on soft reboot, but without any message. After that, I updated that Ubuntu 16.04 to 18.04, and again no freezes on reboot.

P.S. With Ubuntu 18.04 upgraded from 16.04 I got on reboot infinity loop with messages:

Synchronous exception at 0x00000000BBC129BC

Bootscript is aarch64 mainline 4.15.11 rev1.

vazhnov avatar Mar 06 '19 18:03 vazhnov

I have the same problem with a debian stretch on an ARM64-2GB in Paris

bootscript is not part of the problem as I changed it to all possible values without any luck

ThomasBlt avatar Mar 06 '19 20:03 ThomasBlt

Also having this issue ... At least we know that this problem only happens every other reboot.

But the real issue, is this: what if the server is rebooted without my knowledge? It will get stuck with this IRQ Exception at 0x00000000BBC11F38 until I find my service is not running. Until then, no services on the instance.

Scaleway, please fix this ... ARM64 is a very real opportunity.

eliezedeck avatar Mar 24 '19 04:03 eliezedeck

Also have this issue on Scaleway. Ubuntu 18.04. Same error message as @eliezedeck

jronnblom avatar May 12 '19 06:05 jronnblom

This issue is now 7 months old, has any progress been made? I like ARM, but things like this make the whole server lineup completely useless.

Fornax96 avatar Jun 19 '19 08:06 Fornax96

today I facing this issue too.

smilexth avatar Jun 21 '19 06:06 smilexth

Same here

Xstoudi avatar Jun 24 '19 10:06 Xstoudi

I'm getting this as well, this is a serious problem y'all

James-E-A avatar Jul 17 '19 00:07 James-E-A

Just got this message from the support.

However, I understand that you have issues rebooting from within your environment.
This is another issue, caused by the fact that your OS detach volumes when powering off the server.
Since volumes are network-connected, they cannot be re-attached on reboot, and require a hard-reboot from the Scaleway panel to re-attach volume, hence why it does not work.

abitrolly avatar Aug 03 '19 05:08 abitrolly

It is not clear why hardware hypervisor can not detect request for reboots and start to handle reboot process itself when it receives this IRQ Exception at 0x00000000BBC11F38.

abitrolly avatar Aug 08 '19 08:08 abitrolly

Same problem here "IRQ Exception at 0x000000013BC11F38"

gaiar avatar Aug 13 '19 12:08 gaiar

Same here

jakolehm avatar Aug 14 '19 08:08 jakolehm

Same here. Wake up Scaleway.

Preen avatar Aug 19 '19 08:08 Preen

@Preen Scaleway team is working on new HW - https://community.scaleway.com/t/arm64-hangs-with-ubuntu-18-04-fully-updated/7681/6 which is promised to be soon.

abitrolly avatar Sep 02 '19 05:09 abitrolly

Is there anything we as users can do to help? Testing? Reports? A bug bounty? I still encounter this occasionally when using the 4.9 bootscript, and I'd really like to use some features that are only available in newer kernels.

evanfoster avatar Dec 15 '19 02:12 evanfoster

The message that the problem should have been gone with kernel 5. Did anybody got a chance to test that it is really fixed?

abitrolly avatar Jul 16 '20 12:07 abitrolly

I've just discovered that ARM64 hosting failed - https://www.scaleway.com/en/docs/migrate-c2-arm64-to-virtual-instance-using-rsync/

Well, so long and thanks for all the fish then. )

abitrolly avatar Jul 16 '20 13:07 abitrolly