multipass icon indicating copy to clipboard operation
multipass copied to clipboard

No irq handler for vector

Open jasonmccallister opened this issue 5 years ago • 47 comments

Describe the bug I have been using multipass for a while and I started getting these alerts randomly (they seem to have increased in frequency in the last week). We built a wrapper CLI around multipass for local development: https://craftcms.com/blog/nitro-1-0-released

Not sure where to start troubleshooting, but I have had multiple reports from our users that they are seeing the same somewhat randomly.

To Reproduce

  1. multipass exec nitro-dev -- journalctl -u nitrod -f

Expected behavior The content of journalctl are output, but then I start receiving terminal bells with:

Message from syslogd@nitro-dev at Oct  2 12:13:49 ...
 kernel:[   70.010266] do_IRQ: 3.37 No irq handler for vector

Additional info

  • OS: macOS 10.15.6
  • multipass 1.4.0+mac
  • multipassd 1.4.0+mac

jasonmccallister avatar Oct 02 '20 16:10 jasonmccallister

Hi @jasonmccallister,

What version of Ubuntu are you using in the instances? If you are just using the default, there may be a change since default switch from 18.04 to 20.04 fairly recently (via cloud-images).

Also, I'm not what device is supposed to be on IRQ 3. Could you please post the output of sudo lspci -vvv in the instance?

townsend2010 avatar Oct 05 '20 13:10 townsend2010

@townsend2010 I'm on Ubuntu 20.04.1 LTS and the content of lspci is:

00:00.0 Host bridge: Network Appliance Corporation Device 1275
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Expansion ROM at <unassigned> [disabled] [size=2K]
	Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE-
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		RootCap: CRSVisible-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd-
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

00:01.0 SCSI storage controller: Red Hat, Inc. Virtio block device
	Subsystem: Red Hat, Inc. Virtio block device
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at 2000 [size=128]
	Region 1: Memory at c0000000 (32-bit, non-prefetchable) [size=8K]
	Expansion ROM at c0002800 [virtual] [disabled] [size=2K]
	Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
		Vector table: BAR=1 offset=00000000
		PBA: BAR=1 offset=00001000
	Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: virtio-pci

00:01.1 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin B routed to IRQ 24
	Region 5: Memory at c0002000 (32-bit, non-prefetchable) [size=1K]
	Expansion ROM at c0003000 [virtual] [disabled] [size=2K]
	Capabilities: [40] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00000  Data: 4025
	Kernel driver in use: ahci
	Kernel modules: ahci

00:02.0 Ethernet controller: Red Hat, Inc. Virtio network device
	Subsystem: Red Hat, Inc. Virtio network device
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 17
	Region 0: I/O ports at 2080 [size=32]
	Region 1: Memory at c0004000 (32-bit, non-prefetchable) [size=8K]
	Expansion ROM at c0003800 [virtual] [disabled] [size=2K]
	Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
		Vector table: BAR=1 offset=00000000
		PBA: BAR=1 offset=00001000
	Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: virtio-pci

00:05.0 Network and computing encryption device: Red Hat, Inc. Virtio RNG
	Subsystem: Red Hat, Inc. Virtio RNG
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 20
	Region 0: I/O ports at 20a0 [size=32]
	Region 1: Memory at c0006000 (32-bit, non-prefetchable) [size=8K]
	Expansion ROM at c0008000 [virtual] [disabled] [size=2K]
	Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
		Vector table: BAR=1 offset=00000000
		PBA: BAR=1 offset=00001000
	Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: virtio-pci

00:1f.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Expansion ROM at c0008800 [virtual] [disabled] [size=2K]

jasonmccallister avatar Oct 05 '20 14:10 jasonmccallister

Thanks! How about the output of cat /proc/interrupts as well?

townsend2010 avatar Oct 05 '20 14:10 townsend2010

Here it is. It only happens randomly.

           CPU0       CPU1       CPU2       CPU3
  4:          0          0       1696          0   IO-APIC   4-edge      ttyS0
  8:          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0   IO-APIC   9-fasteoi   acpi
 24:          0          0          0        177   PCI-MSI 18432-edge      ahci[0000:00:01.1]
 25:          0          0          0          0   PCI-MSI 16384-edge      virtio0-config
 26:          0          0          0      90303   PCI-MSI 16385-edge      virtio0-req.0
 27:          0          0          0          0   PCI-MSI 32768-edge      virtio1-config
 28:          0          0      17276          0   PCI-MSI 32769-edge      virtio1-input.0
 29:          0      82353          0        199   PCI-MSI 32770-edge      virtio1-output.0
 30:          0          0          0          0   PCI-MSI 81920-edge      virtio2-config
 31:          0          6         23          0   PCI-MSI 81921-edge      virtio2-input
NMI:          0          0          0          0   Non-maskable interrupts
LOC:    2827357    3221417    3260038    2232675   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
IWI:          4          0          0         11   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:     145625     134227     135505     132791   Rescheduling interrupts
CAL:      31639      32032      36893       9009   Function call interrupts
TLB:       8232       8188       8133       8324   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:         42         42         42         42   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0   Posted-interrupt wakeup event

jasonmccallister avatar Oct 05 '20 14:10 jasonmccallister

And is it always with the message do_IRQ: 3.37 No irq handler for vector with the 3.37 being important here? Also, when this occurs, does it seem to negatively affect the running instance or is it just scary noise in the logs?

townsend2010 avatar Oct 05 '20 14:10 townsend2010

That is the one I have seen most often. It does not seem to affect the running instance, just doing anything that requires reading output while it occurs makes it unreadable, as it happens repeatedly.

jasonmccallister avatar Oct 05 '20 14:10 jasonmccallister

Ok, this is looking to be some kernel issue on 20.04 when using Hyperkit. Unfortunately, because we have to boot the kernel itself, simply apt update on the kernel still won't use a newer kernel.

What output is being affected by this? Any output, like this is showing up in the TTY?

townsend2010 avatar Oct 05 '20 14:10 townsend2010

Here is an example of what I see when this occurs, I am running exec to call journalctl and it randomly occurred.

Screen Shot 2020-10-05 at 5 20 35 PM

It does not seem to cause an issue, I ran the make target to reload the service and output it still occurring:

Screen Shot 2020-10-05 at 5 23 02 PM

jasonmccallister avatar Oct 05 '20 21:10 jasonmccallister

Hi @jasonmccallister,

I've been trying to do some additional digging and this is certainly an issue w/ the kernel and either something in Hyperkit or some additional software installed in the instance. I looked through your Nitro custom cloud-init and there are certainly a host of additional packages installed, so it may be something there.

Also, it could be a red herring, but does this tend to occur when you running the make api step in Nitro?

Lastly, which kernel is reported in uname -r in the instance?

That said, I'm running out of ideas and I don't think there is anything in the Multipass realm that we can do to address this.

townsend2010 avatar Oct 07 '20 15:10 townsend2010

@townsend2010 this occurs when running nitro ssh which is a wrapper around multipass shell.

Here is the uname -r output:

5.4.0-48-generic

jasonmccallister avatar Oct 23 '20 18:10 jasonmccallister

Same here with 5.4.0-53-generic. The error occures when I open the multipass shell by the Open shell command on macOS. Because this messages interrupts typing, I cannot use my multipass instance any more.

sisyphosloughs avatar Nov 16 '20 17:11 sisyphosloughs

Same here with 5.4.0-54-generic kernel. Exiting the shell and reattaching using multipass shell instance-name.

The outputs are do_IRQ: 0.39 No irq handler for vector. It started when I'm compiling something, but I'm not sure how to reproduce it. Happy to test more and try to reproduce.

yangl1996 avatar Nov 24 '20 16:11 yangl1996

Same here with 5.4.0-54-generic kernel. Exiting the shell and reattaching using multipass shell instance-name.

The outputs are do_IRQ: 0.39 No irq handler for vector. It started when I'm compiling something, but I'm not sure how to reproduce it. Happy to test more and try to reproduce.

Same here. But I control+C to "cancel" this message, and I am able to proceed my work in multipass instance. But still, not sure what's happening here

xuzhe0205 avatar Nov 25 '20 13:11 xuzhe0205

I am running into this issue. I have just started using Multipass for the first time - a fresh install of the current version on my Mac running 10.15.7. Within minutes of creating a brand new instance of 20.04 (kernel 5.4.0-60-generic) and simply running a couple commands (in my case simply apt update / upgrade) these messages began appearing.

RSully avatar Jan 12 '21 02:01 RSully

I have the same issue on OSX, always the same failure message randomly popping up as posted before. other than that, multipass works beautifully for me :)

schlenki avatar Jan 12 '21 13:01 schlenki

I have the same issue, and it is quite new. The message:

Message from syslogd@primary at Jan 13 07:15:58 ... kernel:[ 97.526663] do_IRQ: 4.36 No irq handler for vector

Till now it never appeared. Rare, but disturbing on console.

multipass 1.5.0+mac multipassd 1.5.0+Mac macOS Big Sur 11.1

lmatusek avatar Jan 13 '21 06:01 lmatusek

Same issue here with Nitro and Multipass. Multipass deletes local bound vendor folder and after restart it outputs

nitro 1.1.1 multipass 1.5.0+mac multipassd 1.5.0+Mac macOS Big Sur 11.1

Message from syslogd@nitro-dev at Jan 13 17:38:48 ... kernel:[ 99.704253] do_IRQ: 3.37 No irq handler for vector

After "composer install" within nitro ssh I'm able to restart, but after few hours the instance crashes (and deletes the vendor folder again).

denisyilmaz avatar Jan 13 '21 16:01 denisyilmaz

Hey all,

It seems this issue is cropping up more and more. I've yet to reproduce on my MacBook, so in order to try to figure out anything common oh what triggers this, I need more data. Would would be helpful is to know:

  1. Mac hardware specs such as what type of Mac, CPU sku, amount of RAM, disk type and size.
  2. macOS version (most reports already have this, but it's still helpful if not posted).
  3. How the instance was launched such as any --cpu, --mem, and --disk options given and also the version of Ubuntu launched.
  4. Any mounts defined in the instance.
  5. Any additional software/packages installed in the instance.
  6. The type of activity occurring in the instance when this is seen.

That's all I can think of for now and thanks!

townsend2010 avatar Jan 13 '21 16:01 townsend2010

Here is my profile, attached. profile.txt

lmatusek avatar Jan 13 '21 17:01 lmatusek

Hello, my specs:

  1. MacBook Pro 13" Retina Mid 2018, 2.3 GHz Quad-Core i5-8259U, 16 GB @ 2133 MHz RAM, 512 GB Apple SSD
  2. I use macOS Catalina Version 10.15.7
  3. I launched as follows (but it also happened w/ other disk or ram launchs in the past) multipass launch -c 4 -m 4G -n test ; creating a Ubuntu 20.04 LTS VM
  4. No mounts in VM
  5. just ran sudo apt update, nothing else on the VM
  6. after sudo apt update, the failure happens; after installing Go 1.15.6 and Singularity 3.7.0 (plus dependencies), it stopped happening

schlenki avatar Jan 13 '21 19:01 schlenki

Aha, I was able to reproduce! Looking at journalctl, it looks to be an error in the ata driver in the instance's kernel. I'll see if I research this some more and see what is going on.

townsend2010 avatar Jan 13 '21 19:01 townsend2010

Quick update...it's the virtual DVD drive in the instance that is misbehaving on the virtual SATA bus. Since we are loading the kernel directly, I'm going to experiment with some kernel command line parameters such noapic, etc. to see if it helps without any detectable regressions.

townsend2010 avatar Jan 13 '21 21:01 townsend2010

Thank you for your very much appreciated work in this! :)

schlenki avatar Jan 14 '21 08:01 schlenki

Hey all, @townsend2010 has a package for you to try:

macOS build available: ~~multipass-1.7.0-dev.513.pr343+g46704338.mac-Darwin.pkg~~

Saviq avatar Jan 15 '21 11:01 Saviq

Installing this pkg I get the following fault printed on my terminal, trying to do a multipass launch command:

launch failed: cannot connect to the multipass socket
Please ensure multipassd is running and '/var/run/multipass_socket' is accessible

multipass version shows: multipass 1.7.0-dev.513.pr343+g46704338.mac

EDIT: reinstalling multipass using brew helped w/ the previous issue - however, when launching (also when trying multipass shell), I get: launch failed: The following errors occurred:
Instance stopped while starting

Reinstalling with brew as well as the normal package leads to the same fault (even before applying the dev build). Soooo I'm a bit clueless in terms of how that could happen. Will continue trying to make it work again, wish me luck - if you have any idea what possibly went wrong here, let me know :p

Edit No.2: ... that also is solved now (after a good solid restart - and also resetting my macs SMC) - but now I also don't have the original issue without the given package anymore. I'll try the dev version again as soon as the IRQ handler error pops up again, for now it seems to be fine without it.

schlenki avatar Jan 15 '21 18:01 schlenki

After some testing, it looks like adding the noapic kernel option works around this. I have not detected any regressions in behavior. Once our Mac ci is unblocked and builds a test package, I'll post an updated link here. The previous test package posted here is broken for unrelated reasons.

townsend2010 avatar Jan 22 '21 15:01 townsend2010

Hi all,

Here is a test package that is working for me: multipass-1.7.0-dev.549.pr343+g83f3483e.mac-Darwin.pkg

Please try it out and report back if it fixing this issue for you. Thanks!

townsend2010 avatar Jan 25 '21 13:01 townsend2010

Here's a link you'll be able to access:

https://multipass-ci.s3.amazonaws.com/pr343/multipass-1.7.0-dev.549.pr343%2Bg83f3483e.mac-Darwin.pkg

Saviq avatar Jan 25 '21 15:01 Saviq

Hi all, has anyone had the chance to try the above package if it fixes the issue for you?

Saviq avatar Feb 11 '21 16:02 Saviq

Hi,

I am sorry I haven’t got time to make any check till now. Now I am trying….

Br,

Laszlo

  1. febr. 11. dátummal, 17:31 időpontban Michał Sawicz [email protected] írta:

Hi all, has anyone had the chance to try the above package if it fixes the issue for you?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/canonical/multipass/issues/1778#issuecomment-777621786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKGP4O6YSUP4KOFJ4R6IZELS6QA5BANCNFSM4SB4J26A.

lmatusek avatar Feb 12 '21 08:02 lmatusek