r8125-esxi icon indicating copy to clipboard operation
r8125-esxi copied to clipboard

drops off network

Open Rolzzz opened this issue 2 years ago • 21 comments

Hello there, thank you for providing the drivers. Unfortunately if I do large file transfers (40GB vmdk file) my esxi host drops off the network. I have to unplug the network cable and then back in for connectivity to resume... tried other cables and switches but the issue persists.

esxcli network nic get -n vmnic0

Advertised Auto Negotiation: true Advertised Link Modes: 10BaseT/Half, 10BaseT/Full, 100BaseT/Half, 100BaseT/Full, 1000BaseT/Full, 2500BaseX/Full Auto Negotiation: true Cable Type: Twisted Pair Current Message Level: 51 Driver Info: Bus Info: 0000:02:00.0 Driver: r8125 Firmware Version: Version: 9.007.01-NAPI Link Detected: true Link Status: Up Name: vmnic0 PHYAddress: 0 Pause Autonegotiate: true Pause RX: true Pause TX: true Supported Ports: TP Supports Auto Negotiation: true Supports Pause: true Supports Wakeon: true Transceiver: internal Virtual Address: 00:50:56:5a:e2:95 Wakeon: MagicPacket(tm)

Rolzzz avatar Jun 20 '22 06:06 Rolzzz

Hi!

I have the same issue, what machine are you running your ESXi on? I've tried drivers up to 9.009.01 running on ASUS PN51 - i've read somewhere that a 2.5 GB network card with realtek, although USB based, had heat issues and started dropping connections and throttling down when heat rises. Might be a culprit since its when there is lots of transfer that it happens for me.

/Kaj

KajLehtinen avatar Jul 17 '22 07:07 KajLehtinen

yeah I decided to move away from the onboard nic in my ASUS PN50-E1 so I went with an external usb nic with the ASIX AX88179 chipset (https://flings.vmware.com/usb-network-native-driver-for-esxi) not ideal, but I wanted a stable esxi box... funnily even though usb this https://www.amazon.com.au/gp/product/B00AQM8586 was actually faster throughput too than native nic with this driver... before it would drop off of course... if driver gets updated I'd be all to happy to test again.

Rolzzz avatar Jul 17 '22 20:07 Rolzzz

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

KajLehtinen avatar Jul 18 '22 07:07 KajLehtinen

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

I have not seen this one and will have to try it 👍

Rolzzz avatar Jul 18 '22 09:07 Rolzzz

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

I have not seen this one and will have to try it 👍

@KajLehtinen sadly still same issue

Rolzzz avatar Jul 19 '22 01:07 Rolzzz

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

I have not seen this one and will have to try it 👍

@KajLehtinen sadly still same issue

I have the same issue, is it really overheating?

Haxiboy avatar Aug 16 '22 17:08 Haxiboy

be surprised if an overheating hardware issue, we'd hear more from the normal Windows users if that were the case.

Rolzzz avatar Aug 16 '22 20:08 Rolzzz

be surprised if an overheating hardware issue, we'd hear more from the normal Windows users if that were the case.

I tought my issue has gone with lengfwang's fork but it happened today. It could be heat as i noticed it only happens when i put heavy workload on the NIC, after i took off my rack's side panel i had to wait 2 weeks for the issue to happen again. (I turned off the climate in the room next to the rack). I'll borrow a thermal camera and i'll monitor what's happening around the NIC and the controller, maybe a small heat sink will solve the problem.

Haxiboy avatar Aug 30 '22 20:08 Haxiboy

be surprised if an overheating hardware issue, we'd hear more from the normal Windows users if that were the case.

I tought my issue has gone with lengfwang's fork but it happened today. It could be heat as i noticed it only happens when i put heavy workload on the NIC, after i took off my rack's side panel i had to wait 2 weeks for the issue to happen again. (I turned off the climate in the room next to the rack). I'll borrow a thermal camera and i'll monitor what's happening around the NIC and the controller, maybe a small heat sink will solve the problem.

I can get to crash every time I send a 80gb vmdk file over via WinSCP... then have to unplug nic from switch, wait, then plug in and it starts working again... until after x min and my continuation of the WinSCP makes it fall over again.

be interested to hear if you can replicate that...

here is where mine sits in my study... I don't think heat related. image

Rolzzz avatar Aug 30 '22 22:08 Rolzzz

Mine is in a standard 4u rack with a ton of noctua fans. I had issue with an overheating Intel NIC before. But i have a dual gigabit NIC lying around i'll try with that too. Or maybe some load balancing would work. Strange is that we watch movies all day and torrents downloading 24/7 but got the issue only when downloading via sonarr. But it could be a coincidence.

Haxiboy avatar Aug 31 '22 02:08 Haxiboy

I have the same issue when using the this driver under esxi 6.7 on ASUSTOR AS6702t. Due to this issue, it is not possible to use esxi on the device. Under Windows with this device this do not happen! So I expect an driver issue or configuration issue within this driver. Looking forward to solutions that are found

Sushifix avatar Aug 31 '22 07:08 Sushifix

I have the same problem. My setup is that i have pfsense inside my esxi host. One intel NIC passthrough into the pfsense, second realtek NIC (onboard) managed by esxi host. Same behaviour. After heavy load (downloading tens of GBs from steam) connection drops randomly. Only fix is to disconnect network cable and reconnect. Not gonna wait for fix from drivers side. Will buy new pcie NIC with intel chip and do it that way...

jakubsuchybio avatar Sep 22 '22 21:09 jakubsuchybio

unfortunately the same issue here. drops every time there is load. disable/enable interface on switch reconnects it. I've build my own custom driver for the latest 9.011.00 and the issue persist.

mcr-ksh avatar Mar 24 '23 10:03 mcr-ksh

gave up with the onboard nic... POS for esxi. Got a usbc one and been rock solid ever since.

Rolzzz avatar Mar 25 '23 10:03 Rolzzz

gave up with the onboard nic... POS for esxi. Got a usbc one and been rock solid ever since.

Can you get a USB NIC without the CPU penalty? I ready somewhere that USB based NICs don't have access to DMA and therefore they load the CPU.

on my home system, I haven't noticed any extra unknown cpu load under normal use... I see cpu go up when I'm downloading some big (high seed) torrent files, but I saw that also on physical boxes before virtualised my torrenting machine.

Rolzzz avatar Mar 25 '23 21:03 Rolzzz

In the release there are a few scripts mentioned which I cannot find anywhere, nor do I know how to properly turn on/off these settings. Anyone tried/found them?

/opt/r8125/temp.sh : Show NIC chipset temperature.
/opt/r8125/tx-off.sh: Turn off Tx offloading, when you cannot open guest openwrt web page, or lagging Windows network neighbor file copy.
/opt/r8125/tx-on.sh: Turn on Tx offloading, default.
/opt/r8125/tso-off.sh: Turn off TSO, default.
/opt/r8125/tso-on.sh: Turn on TSO, try this when you have a nice host PC.

mcr-ksh avatar Mar 25 '23 22:03 mcr-ksh

In the release there are a few scripts mentioned which I cannot find anywhere, nor do I know how to properly turn on/off these settings. Anyone tried/found them?

/opt/r8125/temp.sh : Show NIC chipset temperature.
/opt/r8125/tx-off.sh: Turn off Tx offloading, when you cannot open guest openwrt web page, or lagging Windows network neighbor file copy.
/opt/r8125/tx-on.sh: Turn on Tx offloading, default.
/opt/r8125/tso-off.sh: Turn off TSO, default.
/opt/r8125/tso-on.sh: Turn on TSO, try this when you have a nice host PC.

no I haven't see these

Rolzzz avatar Mar 26 '23 03:03 Rolzzz

I think I just found the issue. I'm currently doing a full re-write of the driver. I was able to nail it down to DAC. image [http://gauss.ececs.uc.edu/Courses/c4029/lectures/dma.pdf]

TSO doesn't work with DAC and maybe 6.7 doesn't properly support it. Until i'm going to release mine it can be tested via: vmkload_mod r8125 enable_tso=1 enable_tx_csum=1 eee_enable=0 hwoptimize=1 tx_no_close_enable=1 enable_double_vlan=1 use_dac=0 autoneg_mode=1

mcr-ksh avatar Mar 30 '23 09:03 mcr-ksh

https://github.com/mcr-ksh/r8125-esxi/releases/tag/net-r8125-9.011.00

mcr-ksh avatar May 08 '23 20:05 mcr-ksh