ec icon indicating copy to clipboard operation
ec copied to clipboard

peci: eSPI host stops responding

Open crawfxrd opened this issue 2 years ago • 12 comments

  • Model: CONFIG_PECI_OVER_ESPI=y
  • EC version: dd555b9012f9
  • OS: Pop!_OS 22.04
  • Kernel: 6.2.7-060207-generic

Cases:

  • Unplugging a display connected to the TBT port causes PECI GetTemp() timeouts
  • Suspend/resume causes PECI GetTemp() timeouts

Steps to reproduce

  1. Plug in a display to the TBT port
  2. Power on system, booting to OS
  3. Unplug the display

Expected behavior

  • PECI commands resume normal operation
  • CPU temp reporting is accurate

Actual behavior

  • PECI commands begin timing out and do not recover
  • CPU temp reported is always 0

Additional info

This specifically affects the PECI-over-eSPI implementation (CONFIG_PECI_OVER_ESPI=y).

Acknowledging the completion in #368 fixed it from hanging on shutdown.

Presumably, this could happen because of:

  • invalid/insufficient checks in peci_available()
  • missing the check to see if a transaction is currently in progress in peci_get_temp()
  • an issue specifically with eSPI usage?

Workaround when it gets in this state:

  • Completely power off system (not restart) and unplug from AC power

Blocks: #370

crawfxrd avatar Jun 21 '23 19:06 crawfxrd

PECI over eSPI is now used by

  • addw4
  • lemp13
  • lemp13-b
  • oryp11
  • oryp12

While GetTemp is the most likely PECI call to fail, #525 shows that WrPkgConfig also can fail because of this.

crawfxrd avatar Apr 10 '25 15:04 crawfxrd

Is there something I can do to help this move forward? I can reproduce it with my HW setup pretty easily so I can provide any additional debug logs if needed.

nextsux avatar May 18 '25 21:05 nextsux

Any chance this will be fixed soon? Something we can help with? It is a really bad experience having a nice and expensive laptop such as the lemp13 that makes more noise than my washing machine.

jaime-ez avatar Aug 06 '25 01:08 jaime-ez

@jaime-ez at this point I've workarounded this by implementing dbus service that switches fan to manual and a sliding bar changing fan speed manually for gnome-shell 🙈 far from ideal but I can share this if you're interested after some cleanup

Not nice, not packaged but I've added some instructions:

https://github.com/nextsux/ec/tree/manual-fan-control

nextsux avatar Aug 06 '25 09:08 nextsux

for me, powering down, removing AC, and then powering back up fixes the problem. Annoying to do on a regular basis but not that big a deal.

ywwg avatar Aug 06 '25 13:08 ywwg

@nextsux thank you sir! I will try your solution.

If folks at system76 don't solve it soon they should at least link to your solution in their docs https://support.system76.com/articles/fan-noise

Powering down the computer every time I change workstation is unacceptable.

jaime-ez avatar Aug 06 '25 13:08 jaime-ez

@ywwg well.. I move with lemp13 daily. I usually have LOTS of work opened. And powering it off is really annoying. So I just live with my tool and manually change the fan speed. But I would say - totally sub-optimal.

I have full all the tools for de-bricking my laptop so I can (and I'm willing to) test any debugging version of EC firmware. I've also tried to study PECI specs so I can move this forward. But no luck. Without insights from @crawfxrd I'm completely lost here 😭 I can imagine there must be lots of things in his hands as EC seems to be one man show. But this is what the community is for. If not directly fixing, we can do testing, trying debug versions, provide logs. Also this is not a random issue. I can reproduce it in 100% of cases.

Btw whats interesting - I can reproduce it only using Philips display. My other displays do not cause this. But in this Philips display it's enough to go to a meeting. Display suspends and I came back to the laptop, that's burning hot.

Which brings me to - @ywwg no, it IS a big deal. I can not afford to reboot the laptop every time I leave it alone for just a few minutes and display suspends. (And no, thanks to some stupid policy, that display can not be set to disable this powersaving. I can set this on laptop. But then when I connect it at home, it will again - be shining into the night whole night there).

nextsux avatar Aug 06 '25 15:08 nextsux

@ywwg well.. I move with lemp13 daily. I usually have LOTS of work opened. And powering it off is really annoying. So I just live with my tool and manually change the fan speed. But I would say - totally sub-optimal.

I have full all the tools for de-bricking my laptop so I can (and I'm willing to) test any debugging version of EC firmware. I've also tried to study PECI specs so I can move this forward. But no luck. Without insights from @crawfxrd I'm completely lost here 😭 I can imagine there must be lots of things in his hands as EC seems to be one man show. But this is what the community is for. If not directly fixing, we can do testing, trying debug versions, provide logs. Also this is not a random issue. I can reproduce it in 100% of cases.

Btw whats interesting - I can reproduce it only using Philips display. My other displays do not cause this. But in this Philips display it's enough to go to a meeting. Display suspends and I came back to the laptop, that's burning hot.

Just to add a data point, I have the issue with a Samsung display

Which brings me to - @ywwg no, it IS a big deal. I can not afford to reboot the laptop every time I leave it alone for just a few minutes and display suspends. (And no, thanks to some stupid policy, that display can not be set to disable this powersaving. I can set this on laptop. But then when I connect it at home, it will again - be shining into the night whole night there).

jaime-ez avatar Aug 06 '25 16:08 jaime-ez

addw4 doesn't appear to have the issue (testing on 18824c7a317f), and it's main difference from the other models it that it doesn't use TBT. So potentially a conflict with upstream transactions?

This I can think of to try:

  • Check/Wait on ESUCTRL0_BUSY before starting the transaction
  • Write-clear only ESUCTRL0_DONE (per Figure 6-10)

crawfxrd avatar Aug 15 '25 17:08 crawfxrd

Hi, apologies if this is not the right place to make this comment, but today I updated to firmware 2025-07-24_c242738 on my lemp13-b and my fan just stopped working, take a look at the psensor screenshot, CPU temp is not even reported:

Image

Has this issue been reported? Is there a workaround?

jaime-ez avatar Oct 28 '25 17:10 jaime-ez

Has this issue been reported? Is there a workaround?

Try powering down, removing AC connection, wait, power back up

ywwg avatar Oct 28 '25 17:10 ywwg

Thanks! I did it once after the firmware update, but apparently it takes 2 times. Now psensor shows the temp and the fan starts over 65 c

jaime-ez avatar Oct 28 '25 17:10 jaime-ez