pimd icon indicating copy to clipboard operation
pimd copied to clipboard

PIMD causes random kernel panics: needs to validate interfaces before attempted use

Open MrPeteH opened this issue 2 years ago • 9 comments

My pfSense started reliably crashing (random but always within 1-2 hr of boot) with kernel panics pointing to pimd.

After much sleuthing, I discovered the root cause:

  • Over a year ago, I eliminated some VLANs and subnets on my network (now combining wifi/wired instead of separate for each)
  • As a result, several interfaces were deleted
  • Not surprisingly, the system is unable to auto-delete the interfaces (and associated subnets) from various places in the overall config (firewall rules and aliases, pimd config, pfblocker config)
  • In most cases, nonexistent interfaces are ignored

However, pimd doesn't ignore the invalid interfaces. It apparently attempts to use them or at least do something with them.

Most of the time, this doesn't cause an issue (I did run for about a year like this!) But sometimes it does... leading to a kernel panic.

I think I have backup copies of various config files if necessary.

(Note: I am still eventually hoping to track down #171 )

MrPeteH avatar Mar 06 '22 20:03 MrPeteH

I'm sorry to hear that, but kernel crashes are not really the responsibility of userland applications. The kernel should handle such issues. You have the option of course of disabling pimd on select interface (or disable all, and then selectively enable), which could be a way to work around this problem.

troglobit avatar Mar 07 '22 05:03 troglobit

Hmmm. That makes sense. (BTW, I do disable all and selectively enable. It's a bad *.conf file containing references to invalid interfaces due to having deleted some VLANs. Thus, pimd isn't validating interface references in the conf file? )

Yet, here's the strange thing I noted: this is a kernel panic in the "pim" area of the kernel.

Do I understand correctly that pimd is modifying various parameters of the built-in "pim" support?

On further reflection, presumably:

  • there ought to be nothing (even 100% invalid info) that pimd can do to cause a kernel crash.
  • If there IS such a crash, then the kernel is not fully validating the information passed to it.

Does that make sense? Or am I simply out to lunch on this? ;)

MrPeteH avatar Mar 11 '22 18:03 MrPeteH

Yes correct. You are spot on. Regardless what little old pimd does, it shouldn't crash the kernel.

troglobit avatar Mar 11 '22 22:03 troglobit

So I will report this up-chain 😏

MrPeteH avatar Mar 13 '22 04:03 MrPeteH

On further reflection... it's TRUE that userland "shouldn't" crash the kernel. At the same time, defense-in-depth design suggests that all layers are well served by handling things well. Can't be much overhead to validate interfaces at startup time and remove (or complain about) any that are invalid.

MrPeteH avatar Mar 15 '22 00:03 MrPeteH

Of course. And at startup pimd (master) does this by calling getifaddrs(), comparing that with the interfaces enabled (from command line and) in pimd.conf. There is of course a certain window where interfaces may go missing between this initial probe and the milliseconds in between pimd actuall registers VIFs with the kernel to be used for multicast routing. At runtime, however, there is no check if interfaces suddenly go disappear. Here it really is up to the kernel to clean up any VIFs registered with phyiscal interfaces, and return EIO or similar for any outstanding recvfrom(), setsockopt(), or similar that have a socket open on any of these interfaces.

troglobit avatar Mar 15 '22 08:03 troglobit

In my case, we're not dealing with dynamic change. Six months ago I removed some VLANs, and thus the associated interfaces. For whatever reason, now having those in pimd.conf leads to kernel panics within seconds to hours of startup.

I hear you saying you already validate the interfaces. Should an invalid interface in pimd.conf...

  • Produce a startup error?
  • Simply be ignored by PIMD?

Something strange certainly happens.

MrPeteH avatar Mar 17 '22 01:03 MrPeteH

You know, I'm not really sure what it is you expect I should do for you here in this bug report. You've not declared how you start pimd, or what your pimd.conf runs. I don't know what version of pimd you are using, or what patches pfSense may have applied to integrate it into FreeBSD. I don't use FreeBSD myself, so for the pimd project that's like a second or third tier platform.

There's lots of error handling in pimd, the code is here on GitHub for you to inspect and learn. Check out config.c, vif.c and kern.c and you can see the error handling and log levels used for each such message.

troglobit avatar Mar 17 '22 17:03 troglobit

I understand. Time for me to put on my coder/diagnosis hat and dig in. ;)

MrPeteH avatar Mar 24 '22 18:03 MrPeteH