Embedded Controller does not read temperatures continuously, leading to no fan control
Hello!
Since some time I noticed that my Framework Laptop 13 AMD 7840U gets really hot. Uncomfortably hot. Also, I noticed that there's no fan spin at all, if that happens. But, also, I notice fan spins from time to time, often also full speed.
The laptop runs Archlinux, the latest BIOS firmware, although I noticed this also with previous versions.
I used ectool and framework_tool to investigate and found, that neither tool updates its reported temperatures under load (compiling or s-tui stress mode) and thus the EC does not see the necessity to spin up the fans according to the limits. And, expectedly, this leads to CPU throttling and a burning hot chassis.
I observed that after a fresh boot, the report of CPU temperatures can occasionally be Zero (0 °C) for all reported temperatures. Also after a suspend/wake cycle this might fix itself. Sometimes after a shortish period of time (< 2 min) the EC reads temperatures, however, recorded temperatures remain static, irrespective of CPU load. Sometimes it simply does not. Once I repeat a sleep/wake cycle, the temperature often updates and spins up the fan if needed, which I almost always the case, and the fans will go full blast to cool down the APU.
Since, the dropping temperature after ending the load will also not be noticed by the EC, the fan will stay on full-blast indefinitely, or until I suspend the laptop again.
This behavior has been observed for battery or AC mode.
Could you please tell me where to look further for logs or details, why the EC is stuck in its temperature measurements?
My understanding would be, that there is a continuous measurement. E.g. like all the other sensors that sensors reports.
Attachments:
EC version:
workframe in ~
✗ sudo framework_tool --versions
Mainboard Hardware
Type: Laptop 13 (AMD Ryzen 7040Series)
Revision: MassProduction
UEFI BIOS
Version: 03.09
Release Date: 04/22/2025
EC Firmware
Build version: azalea_v3.4.113385-ec:c25dec,os:7b88e1,cmsis:4aa3ff 2025-04-14 01:55:38 marigold1@ip-172-26-3-226
Current image: RO
PD Controllers
Right (01): 0.0.1C (MainFw)
Left (23): 0.0.1C (MainFw)
Touchpad
Firmware Version: v0704
HDMI Expansion Card
Active Firmware: 106 (3.0.10.06A, MainFw)
workframe in ~ took 0s
❯ sudo ectool version
RO version: azalea_v3.4.113385-ec:c25dec,os
RW version: azalea_v3.4.113385-ec:c25dec,os
Firmware copy: RO
Build info: azalea_v3.4.113385-ec:c25dec,os:7b88e1,cmsis:4aa3ff 2025-04-14 01:55:38 marigold1@ip-172-26-3-226
Tool version: 0.0.1-isolate Apr 30 2025 none
No reporting of Temperatures after Boot/Sleep/Wake Cycle:
Every 1.0s: sudo framework_tool -vvvvv --thermal workframe: 02:21:31 AM
in 0.019s (0)
[DEBUG] Chromium EC Driver: CrosEc
[TRACE] get_smbios() linux entry
[DEBUG] read_memory(offset=0x0, size=0xF)
[DEBUG] read_memory(offset=0x10, size=0x8)
F75303_Local: 0 C
F75303_CPU: 0 C
F75303_DDR: 0 C
APU: Error
Fan Speed: 0 RPM
[INFO ] Fan Speed: Not present
[INFO ] Fan Speed: Not present
[INFO ] Fan Speed: Not present
On AC/battery readout after successful suspend/wake cycle. Initial state prior to the next two pictures.
workframe in ~
❯ sudo framework_tool -vvvvv --thermal
[DEBUG] Chromium EC Driver: CrosEc
[TRACE] get_smbios() linux entry
[DEBUG] read_memory(offset=0x0, size=0xF)
[DEBUG] read_memory(offset=0x10, size=0x8)
F75303_Local: 42 C
F75303_CPU: 43 C
F75303_DDR: 42 C
APU: Error
Fan Speed: 2340 RPM
[INFO ] Fan Speed: Not present
[INFO ] Fan Speed: Not present
[INFO ] Fan Speed: Not present
System in s-tui stress mode after the intial setup:
System still in stress mode, after suspend/wake cycle. Temperatures got updated, fans immediately ramp up to cool the CPU. Also notice the 500 MHz increase in clock speed once the fans provide more cooling.
Was looking for info, encountered the same thing just recently.
Mainboard Hardware
Type: Laptop 13 (AMD Ryzen AI 300 Series)
Revision: MassProduction
UEFI BIOS
Version: 03.03
Release Date: 03/10/2025
EC Firmware
Build version: lilac-3.0.3-413f018 2025-03-06 05:45:28 marigold2@ip-172-26-3-226
Current image: RO
PD Controllers
Right (01): 0.0.0B (MainFw)
Left (23): 0.0.0B (MainFw)
Laptop Webcam Module (2nd Gen)
Firmware Version: 1.1.1
Touchpad
Firmware Version: v0704
HDMI Expansion Card
Active Firmware: 106 (3.0.10.06A, MainFw)
I noticed the laptop was very hot, had just finished charging. I had expected the fans to run but they did not, so I checked the temps and notice they looked low. When I manually turned the fans on, they did not change either. After a restart of the laptop, I noticed them begin changing over time as expected.
(is there some command I can run to restart whatever is doing this manually, so that I can at least recover without restarting the laptop entirely?)
I've been facing this issue on Linux as well. The laptop will get super hot and stay at high 90°C temperatures, then after some time it will drop to 65-67°C and the fans are reported to be spinning at 4000-4200 RPM and the chassis is hot. When engaging all cores - stress test, compiling code, etc, the CPU will be throttled hard and not reach more than 1.4-1.7GHz.
Restarting the laptop fixes the issue temporarily until it occurs again.
Mainboard Hardware
Type: Laptop 13 (AMD Ryzen 7040Series)
Revision: MassProduction
UEFI BIOS
Version: 03.09
Release Date: 04/22/2025
EC Firmware
Build version: azalea_v3.4.113385-ec:c25dec,os:7b88e1,cmsis:4aa3ff 2025-04-14 01:55:38 marigold1@ip-172-26-3-226
Current image: RO
PD Controllers
Right (01): 0.0.1C (MainFw)
Left (23): 0.0.1C (MainFw)
Touchpad
Firmware Version: v0704
Running Gentoo Linux with kernel 6.15.6 currently, but has been happening on previous kernel versions as well.
I'm afraid to revert my BIOS version in case something breaks because this is my work machine as well.
Edit:
Running framework_tool --fansetrpm 7000 will actually set the fans to their max speed and start to cool off the CPU. However, even though the thermal reports are showing low temperatures, the CPU is still throttled when engaging all cores:
F75303_Local: 55 C
F75303_CPU: 47 C
F75303_DDR: 51 C
APU: 60 C
Fan Speed: 6826 RPM
Edit 2:
After a direct reboot, running a stress test starts at 3.9GHz and immediately gets throttled even though temperatures are showing low-to-mid 60°C. And stressing the CPU doesn't increase the temperature even if the fans are running at the highest RPM. I really have no idea what's going on at this point, I'll wait for it to cool down even more.
What is your current BIOS @Kaukov ? 3.05 ? Or 3.09 ?
What is your current BIOS @Kaukov ? 3.05 ? Or 3.09 ?
UEFI BIOS
Version: 03.09
Release Date: 04/22/2025
I just did a quick experiment.
After the issue appeared again and the sensors under Linux were stuck at 60-something °C and the fans refused to spin up, I shut down the laptop and put in a Windows 11 SSD I have lying around.
When Windows 11 booted up, the fans spun up to their maximum and HWMonitor showed 92°C. When I ran a stress test, the CPU frequency was high and the fans were working fine. I didn't experience any throttling.
After that I quickly switched back to the Linux SSD and CPU frequency and temperature are iffy. The frequency, when stressing the CPU, sometimes drops to 1.6-1.8GHz for parts of a second, then goes back up to what it was previously. It's been 5 minutes and the issue still hasn't occurred, but I expect it to in the next hour or so.
So my conclusion is that it might be Linux-related, possibly something from the 6.15 kernel.
I don't know that that test is revealing. Actually, I thought I had posted this workaround here but I see now that I did not. All you have to do to "reset" it is sleep the laptop and bring it back up. Whatever state gets broken is resolved via cycling in that way. Certainly by a power cycle. So, since it seems to be intermittent (I don't yet know something specific that triggers it), just booting to Windows isn't enough to say "it doesn't happen on Windows"; and, powering cycling when it's impaired clearly recovers from the state so it's not enough to say that it was broken when you booted Windows...
I don't know that that test is revealing. Actually, I thought I had posted this workaround here but I see now that I did not. All you have to do to "reset" it is sleep the laptop and bring it back up. Whatever state gets broken is resolved via cycling in that way. Certainly by a power cycle. So, since it seems to be intermittent (I don't yet know something specific that triggers it), just booting to Windows isn't enough to say "it doesn't happen on Windows"; and, powering cycling when it's impaired clearly recovers from the state so it's not enough to say that it was broken when you booted Windows...
Interestingly in my case this workaround doesn't work. After suspending and waking up, the CPU is still throttled and the fans don't spin up.
I posted about Windows, because even restarting Linux doesn't fix it most of the time or even all the time. Windows just made an impression because it instantly picked up the proper temperature and spun up the fans.
Huh, that's interesting. What I've been doing if it "feels hot" is sudo watch -n 1 'framework_tool --thermal'. If I don't see the temps changing, I press the power button (which enters ... some sleep state. I don't know how to determine the full specifics, though I'm aware that there are lots of nuances here and varying support). I immediately hit a key and it comes back on. The fans don't always spin up if it's not super hot, but the temp readings begin to vary again. I think the fans running or not may be a separate issue; they seem to sometimes blow when they don't seem to need to, and other times they don't when I'd have expected them to. Like, the "sensitivity" to heat seems to vary over time.
Edit: Possibly, temps are "reading" again, but actually presenting the wrong number? That could explain why they seem to be "sensitive" sometimes and other times quite lax, contradictory to how hot the case feels subjectively
I'm not sure how to check if the CPU is being throttled - how do you do that?
You might be correct that sleeping works. It didn't at first, so I left the laptop to get a drink. After coming back and trying another stress test, the frequency increased properly and the fans started kicking in after the laptop had been at 99-100°C for about 5 seconds.
I'm using s-tui and btop to stress and monitor the system.
I ran the framework_tool --thermal command, but it displays the following temps:
F75303_Local: 59 C
F75303_CPU: 59 C
F75303_DDR: 51 C
APU: 93 C
Fan Speed: 6874 RPM
Interestingly, only the APU temp increases, but the F75303_CPU sensor stays cool at 50-55°C
I had some time and ran Windows 11 for a longer period of time. It took about 30 minutes for the issue to occur. The only difference with Linux is that the fans are actually spinning at higher speeds even if everything else is limited.
I noticed the CPU dropped its voltage and refused to use move than 600mV. The CPU power also stopped going above 15W. The frequency is still throttled and doesn't increase when stressing the CPU.
I noticed a thread on the official forums - https://community.frame.work/t/amd-7840-fan-and-temp-issue-negative-temp-fan-is-not-starting/72337. I will go back to Linux in a few hours and see if I also get negative temperature readings via acpi.
Edit:
Disregard everything... Even though today it lasted many hours without occurring, it just now occurred again and the CPU is throttling and the temperature is reading in the 60s again. acpi -t shows "normal" readings:
Thermal 0: ok, 53.8 degrees C
Thermal 1: ok, 61.8 degrees C
Thermal 2: ok, 61.8 degrees C
Thermal 3: ok, 59.8 degrees C
At this point I'm lost.
~~Update from today:~~
~~When switching back to the Linux SSD, I decided to see if there was dust somewhere. Started blowing on the motherboard and the cooler with compressed air - nothing came out. But I noticed a speck of dust lint on one of the corners of the cooler intake. It was literally a millimeter in size. Removed it carefully with the Framework Screwdriver.~~
~~After that I ran a long 6-hour compile session of packages, then I compiled the Linux kernel, then I gamed - the system hasn't been more stable. It was keeping a very solid and high CPU frequency, the fans worked perfectly fine. Zero drops in performance and no occurrences of the issue.~~
~~My conclusion now is that this little millimeter-long-and-wide speck was causing all the issues OR blowing with compressed air cleared something but it wasn't noticeable to the naked eye.~~
~~Overall I'm no longer facing issues even if the ambient temperature in the room is quite hot today due to the summer heat.~~
~~Can you also check and see if you have any tiny specks somewhere on your cooler?~~
For what it's worth, it seems like the fan problem is just a symptom. It is directly explained by incorrect temperature readings:
My laptop was feeling a little warm (though not burning), so I used acpi -t to log the temps before and after a sleep:
Before:
Thermal 0: ok, 47.8 degrees C
Thermal 1: ok, 50.8 degrees C
Thermal 2: ok, 43.8 degrees C
Thermal 3: ok, 38.8 degrees C
immediately after a sleep/wake:
Thermal 0: ok, 48.8 degrees C
Thermal 1: ok, 68.8 degrees C
Thermal 2: ok, 43.8 degrees C
Thermal 3: ok, 38.8 degrees C
18 degrees C difference in the reading before ad after sleeping.
Notably, the temperature readings were not stuck/frozen as I had previously thought; I have another reading from "before" where "Thermal 1" reported 51.8 before sleeping.
I believe "Thermal 1" is the CPU temp.
Edit: It's plausible that either the sleep process turning off the fans lets heat accumulate quickly, or that waking from sleep briefly spikes the CPU, so this isn't entirely definitive. However, the amount it jumped by is suspicious.
As a counterexample, I just did a sleep-wake again now that the fan has cooled things down and it did not jump up to 68C
I'm suspicious that what's going on here is a kind of broken averaging behavior somehow?
As best I can tell, it seems to occur after the laptop has been active for a period of time and resets when the power / sleep cycles. The fact that the temp just gets "lower and more stable" leans me in the direction of something like that. I could imagine, for example, that the value being read is intended / expected to be an exponentially weighted moving average to provide stability in fan activation. Early during use, there are instances where the fans spin up for like a few hundred milliseconds and stop again. As the temperature "smooths out" (upon gaining more data), the fans are less bursty.
However, if the sample set is extremely long, and recent samples are not weighted heavily enough, it could become very slow to adapt.
I don't know how to investigate this in the code, but it models the behavior I seem to be observing at least.
Edit: @Kaukov this would also explain why your stress test doesn't seem to reproduce the behavior easily. If you're stressing the computer you're pushing the average up so the reported temperature stays higher.
If I'm correct, the repro would be more like "leave it on, not sleeping, and idle for a long period of time -- then run a stress test and see how quickly it reacts"
So I did around 36 hours of testing during the weekend and I have a new hypothesis.
I did not experience any slowdowns or throttling even when stressing the CPU to the max for many consecutive hours at a time. I also tried compiling code, gaming, then stressing the CPU - everything was fine.
But I noticed something interesting - at one point late in the evening, a small sun ray was shining on the input cover a bit above the CPU and to the right, and after about 2 minutes the CPU started throttling. After moving the curtains and turning off the laptop for 5 minutes, the issue was gone again and never occurred again. I haven't shut down the laptop since yesterday afternoon and it's still running perfectly fine.
So my hypothesis is that when the input cover gets hot enough in specific areas, it also influences the readings of the thermal sensors on the mainboard, which in turn throttles the CPU and messes with temperature readings and fan control.
It's just a hypothesis, but it's the only thing that makes sense to me at the moment...
I tried stressing the CPU multiple times today after the laptop had been idle for the whole night - it worked perfectly fine, the temperature got high, the fans spun up, the CPU frequency was also very high. I did multiple stress tests and gaming sessions - all is good.
Edit:
So even after everything I wrote above, I just experienced the throttling again. It happened out of nowhere, I was playing a game and in a work call on Google Meet. Nothing out of the ordinary. So everything I wrote above turns out to be useless.
I've been monitoring the issue more and more these past days and I'm baffled by the results I get.
The laptop ran fine for 2 days - working, playing games, compiling code and it kept spinning up the fans and the CPU didn't throttle once.
Then, out of nowhere while I was in a call and started a game, it started throttling... I decided to not play the game and after the call ended, I just left the laptop to idle for an hour or so. The issue was gone on its own and the CPU was able to hit high temperatures and frequencies.
Edit: The following is not relevant, it was a power-profiles-daemon issue not related to this one.
~~Last night I decided to put my laptop to sleep. Today, after resuming from sleep, it was immediately throttled and still is 40 minutes later.~~
~~I'm literally unable to determine what causes the issue other than some sensor being mislead or the input cover getting so hot that it interferes with the mainboard's sensors.~~
Today I had some more time and I thought - what could a reproducible setup look like. So I decided to try the laptop with the lid closed and working only on an external monitor.
For many hours of playing games, coding, watching videos, all at the same time - the laptop was doing pretty good and the performance was on point. So I decided this wouldn't be effective in regards to this issue.
I also had some updates to install and LLVM to compile so it was a great test. After about 16 minutes the laptop produced the issue. I canceled all builds, shut down the laptop, waited for it to cool down (about 10 minutes) and powered it on. Ran the build again - after about 16 minutes it produced the issue. After a third time of doing this, it always produced the issue after about 16 minutes.
The behavior is the following:
- The laptop will engage all cores at 100%
- The fans will spin up to max RPM
- The CPU frequency will stay between 3.5 and 3.8 GHz
- The CPU temperature will float between 89°C and 99°C
- After about 16 minutes the temperature will drop to ~62°C, the fans will drop to ~2000RPM and the frequency will drop to 1.8-2.2GHz.
So I decided to try something else - unscrew all screws and offset the input cover down and float it a few mm above the laptop. Lo and behold - I don't experience the issue at all.
I'm keep to believe the input cover gets so hot that it bugs some of the sensors/EC and nothing fixes it except turning off / suspending the laptop for a few minutes, in which the input cover will cool down significantly for it to not cause the issue. It's very interesting because the CPU will run at a high temperature (not at the temp limit) and a good high frequency and then just like that will drop to ~62°C and stop spinning the fans...
Another thing I noticed yet again was that when booting up, the CPU was at 100°C but the fans were at ~2000RPM and didn't ever spin faster until I started to build LLVM or watch a YT video.
@myndzi are you able to try it yourself as well? I doubt you need to compile LLVM, but maybe stress the CPU or do something else that will engage all cores at 100% for about 16 minutes with the lid closed.
After I wrote this and LLVM was close to completion, I screwed back the input cover while the laptop was running. The temperatures went about 3°C up, but nothing changed. What I noticed was that when the CPU cores were left unoccupied, the temperature spikes to 100°C, but when all cores get engaged again, the temperature drops to 90-91°C. The fans spin at the same RPM in all cases and the CPU frequency stays in the same range. I don't understand how the CPU temperature gets close to the thermal limit when the CPU stops being used at its limit, but then drops when all cores engage at 100%.
I would love to get some input from Framework on this...
Making a separate comment for visibility. Again, after about 16-20 minutes with the input cover attached, the issue appeared. The fans started spinning down, the CPU frequency is down, the reported CPU temperature is ~60°C and it will stay that way. I'm attaching a screenshot from s-tui
Update:
Compiling with the lid open allowed for about 90 minutes of constant CPU work before the issue happened again. And on Windows, it is the exact same thing.
However, I noticed something else when testing on Windows. When I interrupted the compilation, the fans started spinning faster. When I resumed the compilation, the fans spun down and the CPU was still limited.
I don't do anything intense at all. It crops up when doing nothing more than editing typescript in vscode for hours, or watching youtube videos. I might be able to reproduce by explicitly doing something intense, but I can only assume that the people that need a repro case are the devs, so unless they need more info / samples, I'm not going to bother.
In fact, until someone even responds to this issue, I'm just watching. I did get confirmation from support that someone at least has looked at it, but I don't know more than that.
In fact, until someone even responds to this issue, I'm just watching. I did get confirmation from support that someone at least has looked at it, but I don't know more than that.
That's great that support acknowledges someone is looking into it. I decided to try one last time after I just updated to the latest BIOS and can confirm the issue is now worse. I managed to experience it just 5 minutes after starting the compilation.
In any case, this is becoming a huge burden so I'll also be writing to support. Thank you for your input and I'll also stay as a watcher until someone more knowledgeable responds here.
This will be my last comment regarding the issue.
I got intrigued and disassembled my fan assembly without removing the cooler with the pipes. Inside the fan were a lot of dust balls which I gently removed. After that I cleaned up the rest, including the fan.
After assembling everything back together, I no longer experience any issues regarding temperature or frequency. I even compiled LLVM again with the lid closed for a nice extreme test. The laptop kept up without any issues. I also had a week's worth of work done, and a lot of idle hours for this whole week - nothing changed except I got a bit lower CPU temperatures.
For how long did you have your laptop framework @Kaukov ?
(Just curious in average how long before it's best to clean the fan units inside even if it probably varies depending on usage)
For how long did you have your laptop framework @Kaukov ?
(Just curious in average how long before it's best to clean the fan units inside even if it probably varies depending on usage)
I've had the laptop for more than a year, but this mainboard is about 6 months old. However, I used the laptop in a very dusty environment and that definitely sped up the dust collection process. I doubt you'll need to clean it that often if not used in more extreme environments.
For what it's worth, I had mine for like a month or two before I noticed, and I did not see any dust buildup on the fan or anything like that. My symptoms appear to be different from kaukov's to an extent, though. A couple weeks ago, in fact, it was running the fans constantly and reporting consistently high temps, but I think that was plausibly some firefox tabs causing that. However, the fact that it did that persistently for a long period was novel. I'm at a bit of a loss honestly.
I haven't seen the symptom I reported recently; the laptop has felt hot to the touch with no fans, but sleep/wake did not cause the ~20 degree temperature jump and it did not feel hot enough to be alarming. So it's plausible that it hasn't cropped up again recently for me. I note that the primary times I noticed it were days when I was using vscode all day. I haven't done that recently,, so perhaps there's some amount of low-level heat gain that accumulates from minor but persistent load.
I decided to upgrade my cooler to the newer one with PTM (released with the Ryzen 300 series). It definitely helped with temperatures, but the issue soon started occurring again, especially since I started using my laptop with the lid closed when I'm docked at home.
I started looking into the issue more and I found the culprit. It all comes from AMD's STAPM (Skin Temperature Aware Power Management). The Framework Laptop's design makes the whole chassis super hot and that triggers the STAPM to throttle the CPU until the chassis has cooled down, which won't happen until you either put the laptop to sleep, or you power it off for a few minutes.
And, thankfully, the Linux community is awesome enough that a solution already exists - https://github.com/AlxHnr/amd-ryzen-ignore-stapm. This single script, along with the kernel module mentioned, have fixed the issue. If I run the laptop on the balanced power profile, I hit the throttle pretty fast. Then I just switch to the performance profile, the script sees this and applies the fixed thresholds - I instantly get my performance back and the CPU's temperature doesn't hit thermal throttling at all.
I wish Framework released a BIOS disable toggle for this AMD feature.
Edit: Another tool is this one https://github.com/FlyGoat/RyzenAdj - it's also available on Windows.