sonic-utilities icon indicating copy to clipboard operation
sonic-utilities copied to clipboard

[fast/warm-reboot] add cpufreq.default_governor=performance to BOOT_OPTIONS

Open stepanblyschak opened this issue 1 year ago • 5 comments

What I did

Set cpufreq.default_governor to performance for faster boot time. We observe consistent 1 sec improvement across several devices.

NOTE: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

DEPENDS ON: https://github.com/sonic-net/sonic-buildimage/pull/19634

How I did it

Append this option to BOOT_OPTIONS variable.

How to verify it

Run fast-reboot or warm-reboot. Check:

admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance

After boot is finalized check that it is reset back to default:

admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

stepanblyschak avatar Jul 19 '24 07:07 stepanblyschak

We observe consistent 1 sec improvement across several devices.

What platform and T0 topology were this change tested on? Can you share what changed in the boot up sequence when default_governor=performance?

We need to make sure that this inbuilt performance improvement will not lead to some unexpected bootup sequence, such as race conditions, unexpected preemption sequence, etc.

1s improvement is too tricky to be rightfully attributed to this change. How certain are we that 1s improvement is due to this change? If we know that a certain sequence that got changed due to enabling performance mode, can we not just address that as an enhancement, rather than depending on the default_governor=performance

vaibhavhd avatar Aug 12 '24 16:08 vaibhavhd

@vaibhavhd

We observe consistent 1 sec improvement across several devices.

What platform and T0 topology were this change tested on?

Results were shared over emails.

Can you share what changed in the boot up sequence when default_governor=performance?

Could you please clarify what information you are requesting? This PR does not change the boot sequence.

We need to make sure that this inbuilt performance improvement will not lead to some unexpected bootup sequence, such as race conditions, unexpected preemption sequence, etc.

I consider this a good thing, if this change exposes unexpected bootup sequence or a race condition we will catch it and fix it rather then waiting for a hidden bug to occur in production with .1% repro rate without this change. Please note, we are not requesting this enhancement for a release branch.

1s improvement is too tricky to be rightfully attributed to this change. How certain are we that 1s improvement is due to this change? If we know that a certain sequence that got changed due to enabling performance mode, can we not just address that as an enhancement, rather than depending on the default_governor=performance

fast-reboot test gives a consistent result with deviations less then 1s on nvidia platform. Another our platform with another CPU gives 1.5 sec improvement. A reduction of the average time may be considered an improvement. If you'd like us to prove it with statistics, please suggest the number of runs that will raise the confidence.

stepanblyschak avatar Aug 13 '24 10:08 stepanblyschak

@vaibhavhd Can you please help review?

bingwang-ms avatar Aug 26 '24 21:08 bingwang-ms

@saiarcot895 please review this PR.

vaibhavhd avatar Sep 09 '24 16:09 vaibhavhd

Why is this being done only for reboot? Why not for service warm-restart? Has it been considered what other scenarios will/may benefit from this?

vaibhavhd avatar Sep 09 '24 16:09 vaibhavhd

Why is this being done only for reboot? Why not for service warm-restart? Has it been considered what other scenarios will/may benefit from this?

@vaibhavhd I do not find performance issues with service warm-restart

stepanblyschak avatar Sep 16 '24 09:09 stepanblyschak