unifios-utilities icon indicating copy to clipboard operation
unifios-utilities copied to clipboard

Fail-safe mechanism for potentially dangerous scripts (e.g. that affect basic networking)

Open pgorod opened this issue 4 years ago • 5 comments

An experience in locking myself out of the UDM

Background: while testing my special configs on the UDM, I've had a case where I was playing with routes (and ip rules) on the shell and I, well, completely broke the UDM's networking, which effectively means I don't have any access to it. I rebooted and everything came back.

But now that I am trying to use these potentially dangerous commands in a boot script with udm-utilities, I am thinking, I wonder if a bug creeps in and bricks my UDM, and re-bricks it every time I reboot? Not pleasant.

Ideas for solutions inside the script?

I'd just like to hear suggestions about ways to build my script in ways that allow recovery from such a situation.

A simple solution I thought about: start my script with a 3 minute delay. If it goes well, fine (just a bit slow to get to a ready state...). If it breaks everything, I can reboot, and I have 3 minutes to fix things or kill the script.

Any other suggestions?

Would it make sense to have a generic fail-safe mechanism built into udm-utilities?

Since similar situations might happen to anyone, maybe some better mechanism might be included in the main udm-utilities scripts.

Something like this: if it detects several reboots in sequence, start adding a delay before executing the scripts. This delay would increase as more reboots occur within a small time period.

Note that checking for script success (absence of errors, return code positive) is not what we're looking for here. A successful script might be the one that wrecks the server.

What do you think?

pgorod avatar May 21 '21 19:05 pgorod

In case it's of interest, here's the case in point on my reply here, near the end: https://serverfault.com/questions/1063428/routing-traffic-for-specific-port-range/1064315#1064315

That could a nice sample script for udm-utilities but someone more competent than me would have to make it prettier...

pgorod avatar May 22 '21 15:05 pgorod

What about detecting the presence or absence of a peripheral, and either delaying the boot script or reverting to the factory boot script based on that?

akaihola avatar May 24 '21 05:05 akaihola

@akaihola I thought about something similar, and even tried an experiment. Connecting a cable from one port directly into the other. The logic was that supposedly the switch would detect the STP traffic loop and disable one of the ports. This unusual sign could be used by the script to halt execution.

But on the UDM, each port is a separate adapter, and it just goes on and on, no disabling occurs, and I don't think it can be detected from software. Unless someone knows a way.

But yeah, that is the sort of thinking that could get us a clever solution. Can you think of any peripheral that could be practical for detection?

pgorod avatar May 24 '21 12:05 pgorod

Sorry, I'm only getting my first UDM tomorrow, so I don't really know enough details about the device yet.

A USB port would be handy for this purpose, but it seems the device doesn't have one. If looping the ethernet ports can't be detected, I cannot think of any other obvious solution which would be readily available without special equipment.

akaihola avatar May 24 '21 19:05 akaihola

I wonder if looping the ports can be detected with some clever packet generation + tcpdump scanning... 🤔

pgorod avatar May 25 '21 09:05 pgorod