unraid_kernel icon indicating copy to clipboard operation
unraid_kernel copied to clipboard

Network Error

Open PebThePebble opened this issue 1 year ago • 45 comments

I tried to update to 6.12.8 with your kernal to allow my A380 to work on linux it works perfectly fine on 6.12.6 i updated the unraid OS added your kernal for 6.12.8 and suddenly it kept refusing to connect to the network to the point i had to roll back to the 6.12.6 one just to have a IP be able to be assigned it works perfectly fine on the unraid vanilla 6.12.8 with the network but then i dont have support for the intelA380 which is why i need the kernal change to allow support for it. Its only once i add your latest 6.12.8 update that i cant get it working at all.

PebThePebble avatar Feb 22 '24 00:02 PebThePebble

Same here, My system is unable to obtain an IP on the latest Release Candidate (expected).

freehelpdesk avatar Feb 22 '24 02:02 freehelpdesk

careful there are 2 6.8rc5 releases one for 6.12.6 and one for 6.12.8 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240220 I dont have and arc so cant comment about that ... but network works fine here on 6.12.8

thor2002ro avatar Feb 22 '24 06:02 thor2002ro

Yes I was using the correct version the one that released 2 days ago the card showed up I was able to do "ls /dev/dri" and see the card it would just however always refuse to connect to the network I could still login and use unraid in the terminal which is how I was able to check the GPU showed. But the second i remove your kernel change and go to vanilla unraid Linux the card no longer works as their Linux doesn't support Intel gpus but it does then obtain a network connection and works at 100%

PebThePebble avatar Feb 22 '24 07:02 PebThePebble

The second I put my unraid back to 6.12.6 and installed your kernel for that version it all worked perfectly fine again

PebThePebble avatar Feb 22 '24 07:02 PebThePebble

https://github.com/thor2002ro/unraid_kernel/issues/21

Yeah i have same issues.

confuzedplayer avatar Feb 22 '24 07:02 confuzedplayer

I could try compiling the 6.7 for 6.12.8 to see if its fine.... everything works fine here.... maybe try to dump dmesg when network doesnt work..... "dmesg > /boot/dmesg.txt" should do the trick

thor2002ro avatar Feb 22 '24 09:02 thor2002ro

Also jumping in to say I had similar strange network issues with 6.12.8 - I run a 2 port mellanox NIC and run br1 on the second port for a few containers. For some reason using the 6.12.8 kernel release caused loads of issues with my containers on custom network on br1. Took ages for me to figure out it was the kernel causing it and everything worked fine after rolling back to the stock kernel!

jaimbo avatar Feb 22 '24 10:02 jaimbo

I could try compiling the 6.7 for 6.12.8 to see if its fine

if you could please, i would like to give that a try

confuzedplayer avatar Feb 22 '24 16:02 confuzedplayer

Not sure if this is where the problem is happening,

image

freehelpdesk avatar Feb 23 '24 01:02 freehelpdesk

I could try compiling the 6.7 for 6.12.8 to see if its fine

if you could please, i would like to give that a try

here is 6.7.5 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240223

Not sure if this is where the problem is happening,

image

dump dmesg to usb boot "dmesg > /boot/dmesg.txt" should do the trick

thor2002ro avatar Feb 23 '24 08:02 thor2002ro

Just thought I'd throw my info in here too. I don't have network either. Unraid 6.12.6 - last kernel to work was 6.7.0. Tried 6.7.2, 6.7.3 and a few of the 6.8b Unraid 6.12.8 - last kernel to work was 6.7.0. Tried 6.7.5 and a few of the 6.8.b. Haven't tried the last beta one compiled for unraid 6.12.8 since 6.7.5 didn't work. Seems like the problem happened after 6.7.0, and upgrading to the new unraid did nothing to change things for me. Going back to kernel 6.7.0 fixes it on both versions of unraid.

I assume it is because I also use bridge mode. I use a Broadcom BCM57416 chipset though.

kaaresgut avatar Feb 23 '24 16:02 kaaresgut

I use bridge also.... with BCM5720 if you would post logs maybe we could figure out....

thor2002ro avatar Feb 23 '24 16:02 thor2002ro

I use bridge also.... with BCM5720 if you would post logs maybe we could figure out....

dmsg 6.7.5 dmsg 6.7.0

Looks like br0 is left in disabled state on 6.7.5, but in 6.7.0 it enters forwarding state.

kaaresgut avatar Feb 23 '24 22:02 kaaresgut

I could try compiling the 6.7 for 6.12.8 to see if its fine

if you could please, i would like to give that a try

here is 6.7.5 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240223

Not sure if this is where the problem is happening, image

dump dmesg to usb boot "dmesg > /boot/dmesg.txt" should do the trick

I got it boot up with this but unable to get ARC working

confuzedplayer avatar Feb 24 '24 03:02 confuzedplayer

I use bridge also.... with BCM5720 if you would post logs maybe we could figure out....

dmsg 6.7.5 dmsg 6.7.0

Looks like br0 is left in disabled state on 6.7.5, but in 6.7.0 it enters forwarding state.

curios do you have both bonding and bridging enabled? try only using bridging or only bonding.... from the kernel point of view everything looks fine.... its just a configuration issue somewhere....

Edit:yah tested it in a vm unraid and enabling bonding craps the network.... I'm only using bridging that's why its fine here.... Interesting.... maybe the bonding interface changed some defaults....

thor2002ro avatar Feb 24 '24 11:02 thor2002ro

One thing i did notice from 6.7.0 to > 6.7.0 is the compiler switch, I'm not sure if that can be screwing something up deep down in the source tree. Just a thought maybe worth testing.

freehelpdesk avatar Feb 24 '24 15:02 freehelpdesk

Sorry, would u be able to compile one for 6.12.8 using 6.70?

confuzedplayer avatar Feb 24 '24 16:02 confuzedplayer

One thing i did notice from 6.7.0 to > 6.7.0 is the compiler switch, I'm not sure if that can be screwing something up deep down in the source tree. Just a thought maybe worth testing.

could be the compiler ... but not likely.... I will try some variations...

Sorry, would u be able to compile one for 6.12.8 using 6.70?

I should be able to... since I have git saved...

thor2002ro avatar Feb 24 '24 17:02 thor2002ro

One thing i did notice from 6.7.0 to > 6.7.0 is the compiler switch, I'm not sure if that can be screwing something up deep down in the source tree. Just a thought maybe worth testing.

could be the compiler ... but not likely.... I will try some variations...

Sorry, would u be able to compile one for 6.12.8 using 6.70?

I should be able to... since I have git saved...

Thank you

confuzedplayer avatar Feb 24 '24 18:02 confuzedplayer

here are versions of 6.7 , 6.7.5 , 6.8rc5 built with clang19 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240224 didn't have time to test them tho.....

thor2002ro avatar Feb 24 '24 20:02 thor2002ro

Thank you, ill test a little later.

confuzedplayer avatar Feb 24 '24 21:02 confuzedplayer

I tried the 6.7.5 clang version (dmesg); the compiler doesn't seem to be the reason. Still no network.

I then tried the 6.7.0 clang version (dmesg). It works.

About having both bonding and bridge... not sure why it's like that. I've never touched the network part of unraid. It worked upon my first install and I just never touched it.

kaaresgut avatar Feb 24 '24 22:02 kaaresgut

Was poking around the unraid forums and stumbled upon this. Someone was trying your 6.7.3 kernel and couldn't get network. He managed to fix it by running some commands on each boot. And then the more interesting bit in the comments:

Unraid 6.12.8 has an earlier kernel (point) version, because later kernels have a modification which breaks bonding. For future Unraid versions we made a modification to support bonding on latest kernel versions.

I wonder what the modification could be.

kaaresgut avatar Feb 25 '24 00:02 kaaresgut

had some time today and played with it a little the issue seams to be the /etc/rc.d/rc.inet1 in bond_up

run ip link set $BONDIF master ${BONDNAME[$i]} up type bond_slave

doesn't work anymore changing it to

run ip link set $BONDIF master ${BONDNAME[$i]} type bond_slave
run ip link set $BONDIF up

gets everything working....

thor2002ro avatar Feb 25 '24 13:02 thor2002ro

made a patch package for unraid fixing bonding....

just make a packages dir in the root of the usb stick in the usb/config/go file add

upgradepkg --install-new /boot/packages/*.TGZ before "Start the Management Utility"

unraid_fix-bond_6.12.8-2024.02.25-x86_64-thor.TGZ

PS: don't use extra directory because ur lazy it runs to early in the booting cycle and booting will freeze

thor2002ro avatar Feb 25 '24 14:02 thor2002ro

I was able to have bonding and bridging mode enabled and arc working on 6.70

Thank you for the complies

confuzedplayer avatar Feb 25 '24 15:02 confuzedplayer

I installed the package and tried 6.8b5. Not working. Then i tried the command in the terminal instead and it gave me an unsupported extension error. Huh? I noticed that the package had the extension in all caps so i changed it to lower caps. That worked. After changing the filename and the line in the go file from TGZ to tgz it worked.

Now I'm booting and running 6.8.b5 just fine.

kaaresgut avatar Feb 25 '24 17:02 kaaresgut

glad it worked

thor2002ro avatar Feb 25 '24 18:02 thor2002ro

fixed this in 6.7.9 :) was an intentional breakage....

thor2002ro avatar Mar 07 '24 18:03 thor2002ro

fixed this in 6.7.9 :) was an intentional breakage....

Is the unraid_fix-bond_6.12.8-2024.02.25-x86_64-thor.TGZ file still required?

I just tried installing the 6.7.9 release and still having the same issue with my custom br1 docker network not being able to access containers on the host.

For more context, I run NginxProxyManager as a reverse proxy and I run this on port 80/443 on a second NIC port/br1. This allows me to leave the Unraid Web Interface on the primary NIC with default ports and use NPM on a second IP also with default ports.

When running your kernel, NPM can no longer proxy to other containers on the unraid host, which does work with the stock kernel. (I am able to access the web interface for NPM and proxies to other clients on my home network are also working)

I have tried disabling and re-enabling "Host access to custom networks" which doesn't get things working again.

Any ideas on why this isn't working?

jaimbo avatar Mar 10 '24 19:03 jaimbo