gamemode icon indicating copy to clipboard operation
gamemode copied to clipboard

Warn user & don't use performance mode if machine is temperature limited

Open eero-t opened this issue 5 years ago • 0 comments
trafficstars

Is your feature request related to a problem? Please describe.

If machine is temperature limited, performance mode won't do any good. Instead user should be notified about the issue as fixing failing / inadequate cooling is one of the cheapest ways of significantly increasing performance.

Describe alternatives you've considered

  • Gamemode depends on something that already monitors and warns user about temperature issues, or
  • Gamemode itself monitors temperatures and their alarms from sysfs

Additional context

There are temp & power and their limit value files in hwmon sysfs, and alarm files telling whether they were triggered: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface.rst

I think *_alarm entries trigger only when *_max/_crit limits are exceeded, but I don't think that happens unless cooling really fails (temperature is too high even when chip frequencies are lowered).

Therefore the rule for deciding when device would be temperature limited, is that *_input value is close (say within degree or two) to *_max/_crit limit. Because temperature changes much slower than chip frequencies, polling the values at e.g. 1s interval should be fine.

Because *_alarm files keep their values (until reboot?), they don't need to be sampled, they can be checked whenever it's most convenient.

Then there's also: https://www.kernel.org/doc/Documentation/hwmon/acpi_power_meter.rst

"Some computers have the ability to enforce a power cap in hardware. If this is the case, the power*_cap and related sysfs files will appear. When the average power consumption exceeds the cap, an ACPI event will be broadcast on the netlink event socket and a poll notification will be sent to the appropriate power*_alarm file to indicate that capping has begun, and the hardware has taken action to reduce power consumption."

Message to user could be something like: "Detected device performance being temperature limited. Please make sure your device is not too close to heat sources (radiator, other machines or their power bricks, direct sunlight), airflow to its air vents isn't obstructed, and device is on level. If that doesn't help, make sure device fans work properly and are clean. If device is still temperature limited, you may need to replace current cooling (old thermal paste etc)."

(We had one device using heat-pipes which cooling stopped working properly if it was few days at a slight angle. Setting device level fixed that. Another device had a fan that still worked, but fan was filled with dust-bunny socks it had been knitting... Removing those multiplied the laptop performance.)

eero-t avatar Jan 10 '20 11:01 eero-t