Theseus icon indicating copy to clipboard operation
Theseus copied to clipboard

Potential infinite loop bug in ixgbe driver

Open Hoblovski opened this issue 2 years ago • 5 comments

Hi Theseus community, I would like to report a potential bug.

The following lines of code contains a potential infinite loop bug.

// kernel/ixgbe/src/lib.rs: IxgbeNic::rx_init
            while rxq.rxdctl.read() & RX_Q_ENABLE == 0 {}

The bug is triggered when the device misbehaves, and generates faulty register values that never has the receive enable bit. The driver will be stuck in this loop, resulting in denial of service.

The linux ixgbe driver avoids the problem via a counter bounding the loop, like

// drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

#define IXGBE_MAX_RX_DESC_POLL 10
static void ixgbe_rx_desc_queue_enable(...) {
	int wait_loop = IXGBE_MAX_RX_DESC_POLL;
	// ...
	do {
		usleep_range(1000, 2000);
		rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
	} while (--wait_loop && !(rxdctl & IXGBE_RXDCTL_ENABLE));

	if (!wait_loop) {
		e_err(drv, "RXDCTL.ENABLE on Rx queue %d not set within "
		      "the polling period\n", reg_idx);
	}
}

Can we confirm this bug? Also it can be fixed simply via a counter like the linux driver. I've found a few similar cases but this one is representative.

Thank you very much.

Hoblovski avatar May 12 '23 18:05 Hoblovski

Thanks for the report! I do agree that we need a timeout there.

I believe that @Ramla-I is working on a series of formally-verified device drivers for the NICs in Theseus, so this is likely already in progress or being addressed with a redesign.

In any case, thanks for bringing this up, and feel free to identify the other places in the code you've found such loops in this issue too.

kevinaboos avatar May 16 '23 12:05 kevinaboos

Yes, this is a bug and should be addressed. We've just been lucky that we've never been stuck in the infinite loop. I'll make sure to address these changes in a future driver PR, but let's keep this issue open till it's resolved.

Ramla-I avatar Jun 12 '23 19:06 Ramla-I

I'm awfully sorry for the late reply, but thank you for the feedback.

Yes there're two more cases, and I've cross checked with linux code to filter out false positives, and the rest are:

  • ixgbe lib.rs:917
    • This one's similar, just TX instead of RX.
  • ixgbe lib.rs:704
    • This one's suspicious, and the linux ixgbe seems like bounded by timeout
    // linux/drivers/net/ethernet/intel/ixgbe/ixgbe_82599.c
      	if (hw->eeprom.ops.read(hw, ++data_offset, &data_value))
      		goto setup_sfp_err;
      	while (data_value != 0xffff) {
      		IXGBE_WRITE_REG(hw, IXGBE_CORECTL, data_value);
      		IXGBE_WRITE_FLUSH(hw);
      		if (hw->eeprom.ops.read(hw, ++data_offset, &data_value))
      			goto setup_sfp_err;
      	}
    

Still I'm sorry for the late reply and thank you again for the feedback.

Hoblovski avatar Jun 21 '23 08:06 Hoblovski

Hi @Ramla-I @kevinaboos , sorry for disturbing. I wonder if the latest bugs mentioned in the last reply can be confirmed. Thank you very much!

Hoblovski avatar Jul 04 '23 09:07 Hoblovski

Right, these are both infinite loop bugs. Thanks for pointing them out!

Ramla-I avatar Jul 04 '23 21:07 Ramla-I