pimd
pimd copied to clipboard
PIMD causes random kernel panics: needs to validate interfaces before attempted use
My pfSense started reliably crashing (random but always within 1-2 hr of boot) with kernel panics pointing to pimd.
After much sleuthing, I discovered the root cause:
- Over a year ago, I eliminated some VLANs and subnets on my network (now combining wifi/wired instead of separate for each)
- As a result, several interfaces were deleted
- Not surprisingly, the system is unable to auto-delete the interfaces (and associated subnets) from various places in the overall config (firewall rules and aliases, pimd config, pfblocker config)
- In most cases, nonexistent interfaces are ignored
However, pimd doesn't ignore the invalid interfaces. It apparently attempts to use them or at least do something with them.
Most of the time, this doesn't cause an issue (I did run for about a year like this!) But sometimes it does... leading to a kernel panic.
I think I have backup copies of various config files if necessary.
(Note: I am still eventually hoping to track down #171 )
I'm sorry to hear that, but kernel crashes are not really the responsibility of userland applications. The kernel should handle such issues. You have the option of course of disabling pimd on select interface (or disable all, and then selectively enable), which could be a way to work around this problem.
Hmmm. That makes sense. (BTW, I do disable all and selectively enable. It's a bad *.conf file containing references to invalid interfaces due to having deleted some VLANs. Thus, pimd isn't validating interface references in the conf file? )
Yet, here's the strange thing I noted: this is a kernel panic in the "pim" area of the kernel.
Do I understand correctly that pimd is modifying various parameters of the built-in "pim" support?
On further reflection, presumably:
- there ought to be nothing (even 100% invalid info) that pimd can do to cause a kernel crash.
- If there IS such a crash, then the kernel is not fully validating the information passed to it.
Does that make sense? Or am I simply out to lunch on this? ;)
Yes correct. You are spot on. Regardless what little old pimd does, it shouldn't crash the kernel.
So I will report this up-chain 😏
On further reflection... it's TRUE that userland "shouldn't" crash the kernel. At the same time, defense-in-depth design suggests that all layers are well served by handling things well. Can't be much overhead to validate interfaces at startup time and remove (or complain about) any that are invalid.
Of course. And at startup pimd (master) does this by calling getifaddrs()
, comparing that with the interfaces enabled (from command line and) in pimd.conf
. There is of course a certain window where interfaces may go missing between this initial probe and the milliseconds in between pimd actuall registers VIFs with the kernel to be used for multicast routing. At runtime, however, there is no check if interfaces suddenly go disappear. Here it really is up to the kernel to clean up any VIFs registered with phyiscal interfaces, and return EIO or similar for any outstanding recvfrom()
, setsockopt()
, or similar that have a socket open on any of these interfaces.
In my case, we're not dealing with dynamic change. Six months ago I removed some VLANs, and thus the associated interfaces. For whatever reason, now having those in pimd.conf leads to kernel panics within seconds to hours of startup.
I hear you saying you already validate the interfaces. Should an invalid interface in pimd.conf...
- Produce a startup error?
- Simply be ignored by PIMD?
Something strange certainly happens.
You know, I'm not really sure what it is you expect I should do for you here in this bug report. You've not declared how you start pimd, or what your pimd.conf runs. I don't know what version of pimd you are using, or what patches pfSense may have applied to integrate it into FreeBSD. I don't use FreeBSD myself, so for the pimd project that's like a second or third tier platform.
There's lots of error handling in pimd, the code is here on GitHub for you to inspect and learn. Check out config.c, vif.c and kern.c and you can see the error handling and log levels used for each such message.
I understand. Time for me to put on my coder/diagnosis hat and dig in. ;)