linux icon indicating copy to clipboard operation
linux copied to clipboard

debugfs: dumping some items in debugfs will crash the system

Open keyonjie opened this issue 6 years ago • 16 comments

What I have observed is that on CML/CNL platforms, running below commands will crash the whole system:

  1. echo on > /sys/bus/pci/devices/0000:00:1f.3/power/control
  2. cd /sys/kernel/debug/sof
  3. hexdump hda (or dsp)

There might be memory out of bound access, we need to figure out this for debug purpose.

keyonjie avatar Oct 09 '19 21:10 keyonjie

@keyonjie I'm also able to reproduce this on my APL board when dumping /sys/kernel/debug/sof/dsp

root@gr-mrb:/sys/kernel/debug/sof# hexdump dsp
0000000 0000 0000 0202 0101 0001 0000 0000 0000
0000010 0000 0000 0000 0000 0000 0000 0000 0000
*
0000050 0003

In my case the hang seems to occur here: https://github.com/thesofproject/linux/blob/topic/sof-dev/sound/soc/sof/debug.c#L303 immediately after pos goes higher than 32768.

dragosht avatar Oct 15 '19 10:10 dragosht

I don't see any bounds checks at all, shouldn't there be any? I mean I think only a finite region of memory belongs to the DSP so dumping more than this isn't exactly what I'd say would be a good idea. Having /sys/kernel/debug/sof/dsp be a glorified /dev/mem is probably not the intention here. Also returning exactly how many bytes were requested even if less are available.

(disregard this comment if I missed some bounds check in an upper layer)

paulstelian97 avatar Oct 15 '19 11:10 paulstelian97

there's definitively a size that's used for initialization of debugfs items, so either a) the checks are not correct or b) the sizes are incorrect in the first place.

plbossart avatar Oct 17 '19 17:10 plbossart

@mengdonglin can we assign someone on this one, this looks like a really bad problem?

plbossart avatar Jan 17 '20 00:01 plbossart

@keyonjie @lgirdwood do you know what the 'dsp' BAR debugfs size might be on CNL?

I find that with the following hack there's no crash:

static const struct snd_sof_debugfs_map cnl_dsp_debugfs[] = {
	{"hda", HDA_DSP_HDA_BAR, 0, 0x4000, SOF_DEBUGFS_ACCESS_ALWAYS},
	{"pp", HDA_DSP_PP_BAR,  0, 0x1000, SOF_DEBUGFS_ACCESS_ALWAYS},
//	{"dsp", HDA_DSP_BAR,  0, 0x10000, SOF_DEBUGFS_ACCESS_ALWAYS},
	{"dsp", HDA_DSP_BAR,  0, 0x1000, SOF_DEBUGFS_ACCESS_ALWAYS},
};

plbossart avatar Jun 09 '20 19:06 plbossart

@plbossart it looks to be even larger than 0x10000 from the programming reference, e.g. the SDW IP registers are located in 0x30000~0x6FFFF.

From the result, I guess accessing to some slimbus, or ANC, or LP GPDMA, or DMIC registers leading to the crash.

keyonjie avatar Jun 10 '20 08:06 keyonjie

@keyonjie I am starting to wonder if this has to do with the register ownership. I am not sure what happens if you try to access a register owned by the DSP, e.g. the LP GPDMA.

plbossart avatar Jun 10 '20 13:06 plbossart

@plbossart Yes I have the same feeling. Previously I observed that we get all 0xffffffffs if the registers are not readable, but not sure if reading without ownership hold will crash the DSP or even the Linux. Hi @lbetlej do you have knowledge about this?

keyonjie avatar Jun 10 '20 14:06 keyonjie

@plbossart the issue is still there, maybe this can be covered by security check? @libinyang @RanderWang FYI.

keyonjie avatar Jun 11 '21 02:06 keyonjie

@keyonjie can someone paste the kernel oops. It could be a data abort, i.e. the physical bus does address does not exist (or as already mentioned owned by the DSP). Looks like @plbossart has the fix though.

lgirdwood avatar Jun 11 '21 09:06 lgirdwood

@lgirdwood since the whole OS is panic when this happen, so you can't see any log anymore, hardware reboot is only thing you can do.

@plbossart will be really appreciate if you already have a fix, I can't do anything for it as the assignee of it at the moment.

keyonjie avatar Jun 11 '21 09:06 keyonjie

@plbossart will be really appreciate if you already have a fix, I can't do anything for it as the assignee of it at the moment.

Fix is to make the BAR smaller on applicable platforms.

lgirdwood avatar Jun 11 '21 11:06 lgirdwood

No I don't have a fix. I asked what the size of the memory was and didn't get an answer.

https://github.com/thesofproject/linux/issues/1296#issuecomment-641534874

plbossart avatar Jun 11 '21 14:06 plbossart