ec icon indicating copy to clipboard operation
ec copied to clipboard

darp5: Not early-loading system76_acpi and/or attempting to use most system76_ectool commands crashes ec

Open Thulinma opened this issue 1 year ago • 1 comments

Hey there! I know this is technically an unsupported model, and an old one at that, but figured I'd at least report this here. Ever since installing the open EC on my darp5, most of the system76_ectool commands never worked and/or would crash the EC (needing to disconnect the battery to restart it, as I found no other way to do so). I always figured this was just using "something" unsupported on my older laptop model, and didn't think much of it - I just didn't use the tool. These days I'm running EC commit fc3bad29a2a31555bccaaacb6af6d20b7bb1b7f6 - and in the last few Linux kernels (at least on 6.9.7) something seems to have changed that now causes the non-system76 ACPI kernel modules (I'm not sure which exactly - nor sure how I could find out) to do.... something that also crashes the EC in the same way. After some prodding, it looks like setting system76_acpi to load in early boot (by including it in the initrd image) prevents this crash from happening (likely because it prevents other ACPI drivers from probing and/or attempting to load..?).

Thankfully, I have a workaround (load the system76_acpi kernel module early) that keeps things stable - but I kinda feel like it should not be possible to crash the EC (or at least not this trivially easy)... So I'm reporting it here. ^_^

If there is anything I can do to help debug this, do let me know! I'm honestly a little confused that (apparently?) this doesn't affect other models and/or nobody else noticed/reported the problem. At least on my system, simply trying to run e.g. system76_ectool security with any argument will already cause an EC crash to happen.

Just to make clear what I mean with "EC crash", the symptoms are:

  • Loss of keyboard
  • Loss of power button
  • Front LEDs no longer change in any way (stay on in their last state)
  • Loss of fan (this is probably the most dangerous one, since...)
  • The laptop does stay turned on! (Thankfully it does thermal throttling... 😌)
  • Shutting down the laptop from the OS does seem to work (besides any EC-controlled parts, like the LEDs, which stay in their previous state) but I can't easily tell if it's "really off" afterwards.
  • Holding down the power button (even for several minutes uninterrupted) does not reboot the EC or shut down the laptop at all.
  • Trying to turn the laptop back "on" after a software-induced shutdown has no effect. The only way out of this state I've found is disconnecting the battery and waiting a few seconds, then reconnecting it... which makes everything normal again.

Again, happy to help any way I can! That said, since I've had to go through speedrunning opening my laptop several times now, if there are any ways to force an EC restart that are faster/easier than disconnecting the battery I'd love to be made aware of them, independent of a potential fix for this issue 😅.

Thulinma avatar Jul 03 '24 23:07 Thulinma

Holding down the power button (even for several minutes uninterrupted) does not reboot the EC or shut down the laptop at all.

PWRSW WDT 2 was enabled on boards using IT5570E in #315.

Could try enabling it on IT8587E (#473).

// PWRSW WDT 2 Enable
GCR8 = BIT(4);

Bit 5 of GCR9 (PWSW2EN2 on IT5570E) is marked reserved in the datasheet, but based on our experience with these it might exist and be required.

crawfxrd avatar Jul 04 '24 00:07 crawfxrd

Do you happen to have a known working EC version?

crawfxrd avatar Jul 09 '24 17:07 crawfxrd

Define "known working"? I have backups of every EC version I've ever kept installed for more than a few hours:

  • the original EC the laptop came with (the non-open)
  • the first open EC I installed (build from 2021-01-03), which never crashed on boot (that I recall) and occasionally locked up when "stressed" with ectool commands (specifically printing EC messages would do it after some time, IIRC). I didn't use the commands afterwards to avoid crashes, and didn't feel a need to. It had keyboard problems.
  • An updated EC build on 2022-06-05, which I also don't remember ever crashing on boot, and also still had the keyboard problems. I noticed there were more ectool commands now, but all the new commands I tried would crash my EC, so I didn't try too many and never tried again after.
  • The update I just installed on 2024-06-05, which fixes the keyboard problems but does crash on boot (sometimes, I'd say 50/50) unless the system76_acpi module is early-loaded into the kernel (have not had a crash ever since configuring that). Similar to the previous version, most ectool commands seem to crash this EC.

I have not yet had a chance to compile the modified version and see if it indeed lets me restart the EC by holding power - but am planning to do so sometime in the next few days. 🤞 Also happy to try other things if it could help get to the bottom of this! (For now, at least, it seems my laptop runs stable as long as I don't use ectool... So day-to-day usable, which is the most important thing!)

Thulinma avatar Jul 12 '24 22:07 Thulinma

Define "known working"?

A version that doesn't crash.

ACPI interactions and EC commands should never trigger a crash. At worst, they should time out, but the EC should otherwise continue to operate normally.

Only 6 of the ported boards use IT8587E. darp6 is the only model that has an actual release with System76 EC (latest 2021-07-20_93c2809), and it's optional.

I have not yet had a chance to compile the modified version and see if it indeed lets me restart the EC by holding power

@leviport tested the WDT change on a darp5 and reported it was constantly being triggered. And after looking at the schematics, it obviously won't work: Clevo didn't use the dedicated power switch pin for the power switch.

crawfxrd avatar Jul 12 '24 23:07 crawfxrd

I got my darp5 to reflash externally last night, and while I had the external flasher set up, I hopped between commits to see where the break happened. I found that https://github.com/system76/ec/commit/0f2ff7e54020069d9393453169cbbf2693d56c76 is the first one where it broke. The commit before it, https://github.com/system76/ec/commit/546458e3688a32723b0086fae066a1897c6c9c3a, seems to work fine.

leviport avatar Jul 13 '24 15:07 leviport

WDT gets triggered when keyboard is enabled.

It is hanging on...kbscan_set_column?

kbscan_set_column
;------------------------------------------------------------
;Allocation info for local variables in function 'kbscan_set_column'
;------------------------------------------------------------
;col                       Allocated with name '_kbscan_set_column_col_65536_78'
;colbit                    Allocated with name '_kbscan_set_column_colbit_65536_79'
;------------------------------------------------------------
;	src/board/system76/common/kbscan.c:46: static void kbscan_set_column(uint8_t col) {
;	-----------------------------------------
;	 function kbscan_set_column
;	-----------------------------------------
_kbscan_set_column:
	mov	a,dpl
	mov	dptr,#_kbscan_set_column_col_65536_78
	movx	@dptr,a
;	src/board/system76/common/kbscan.c:48: uint32_t colbit = ~BIT(col);
	movx	a,@dptr
	mov	r7,a
	mov	b,r7
	inc	b
	mov	r7,#0x01
	mov	r6,#0x00
	mov	r5,#0x00
	mov	r4,#0x00
	sjmp	00104$
00103$:
	mov	a,r7
	add	a,r7
	mov	r7,a
	mov	a,r6
	rlc	a
	mov	r6,a
	mov	a,r5
	rlc	a
	mov	r5,a
	mov	a,r4
	rlc	a
	mov	r4,a
00104$:
	djnz	b,00103$
	mov	a,r7
	cpl	a
	mov	r7,a
	mov	a,r6
	cpl	a
	mov	r6,a
	mov	a,r5
	cpl	a
	mov	r5,a
	mov	a,r4
	cpl	a
;	src/board/system76/common/kbscan.c:49: KSOL = colbit & 0xFF;
	mov	dptr,#_KSOL
	mov	a,r7
	movx	@dptr,a
;	src/board/system76/common/kbscan.c:50: KSOH1 = (colbit >> 8) & 0xFF;
	mov	dptr,#_KSOH1
	mov	a,r6
	movx	@dptr,a
;	src/board/system76/common/kbscan.c:51: KSOH2 = (colbit >> 16) & 0x03;
	mov	ar7,r5
	mov	dptr,#_KSOH2
	mov	a,#0x03
	anl	a,r7
	movx	@dptr,a
;	src/board/system76/common/kbscan.c:54: delay_ticks(20);
	mov	dptr,#0x0014
;	src/board/system76/common/kbscan.c:55: }
	ljmp	_delay_ticks

crawfxrd avatar Jul 24 '24 17:07 crawfxrd