sof icon indicating copy to clipboard operation
sof copied to clipboard

LNL NOCODEC alsa-bat playback failure

Open plbossart opened this issue 1 year ago • 41 comments

Failure reported in https://sof-ci.01.org/linuxpr/PR4899/build2589/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821

This could be the consequence of a xrun.

cc:

  • #9164

plbossart avatar Apr 30 '24 20:04 plbossart

another one: https://sof-ci.01.org/sofpr/PR9119/build4654/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-599

It's not the same frequency as previously....

plbossart avatar May 13 '24 15:05 plbossart

https://sof-ci.01.org/linuxpr/PR4899/build2589/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821 has an xrun reported by userspace but nothing in the firmware trace.

https://sof-ci.01.org/sofpr/PR9119/build4654/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-599 has no xrun reported by userspace, but the trace has tons of

[ 4138.879225] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192
[ 4138.883155] <wrn> dai_comp: dai_common_copy: comp:1 0x10004 dai_common_copy(): Copy_bytes 192 + avail bytes 56 < period bytes 384, possible glitch
[ 4138.891155] <wrn> dai_comp: dai_common_copy: comp:1 0x10004 dai_common_copy(): Copy_bytes 192 + avail bytes 56 < period bytes 384, possible glitch
[ 4138.892155] <wrn> dai_comp: dai_common_copy: comp:1 0x10004 dai_common_copy(): Copy_bytes 192 + avail bytes 8 < period bytes 384, possible glitch

plbossart avatar May 13 '24 16:05 plbossart

@ujfalusi @kv2019i what do you think of this one? The topology is supposedly the same as on MTL, so the DMA/firmware changes are probably the reason for these test failures.

plbossart avatar May 13 '24 16:05 plbossart

@plbossart Both above test runs have "Broken pipe":

2024-04-30 18:02:26 UTC [REMOTE_COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 821 -k 2.1
2024-04-30 18:02:27 UTC [REMOTE_COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -f S16_LE -F 821 -k 2.1
Overrun: Broken pipe(-32)

I think this is related. The capture audio file has flawless audio for first 33000 samples and then it breaks down.

kv2019i avatar May 13 '24 19:05 kv2019i

Another, very recent alsabat regression, this time on capture and MTLP_RVP_HDA ba-mtlp-rvp-hda-03. Yet another, different issue?

If different then it would be a very strange timing coincidence because both test cases have been very consistently passing over the last few weeks.

https://sof-ci.01.org/softestpr/PR1193/build422/devicetest/index.html?model=MTLP_RVP_HDA&testcase=check-alsabat-headset-capture-997

marc-hb avatar May 14 '24 13:05 marc-hb

Now we have a broken test with NOCODEC alsa-bat capture

https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-capture-821

I must admit I don't know if what the difference is between alsa-bat-headset-playback and alsa-bat-headset-capture, we need to play and capture on the same devices and test the same loopback

playback: https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821 2024-05-20 19:59:03 UTC [REMOTE_COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 821 -k 2.1 2024-05-20 19:59:04 UTC [REMOTE_COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -f S16_LE -F 821 -k 2.1

capture: https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-capture-821 2024-05-20 20:00:49 UTC [REMOTE_COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 821 -k 2.1 2024-05-20 20:00:51 UTC [REMOTE_COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -f S16_LE -F 821 -k 2.1

Looks the same to me? What am I missing @fredoh9 @marc-hb @kv2019i ?

plbossart avatar May 20 '24 22:05 plbossart

@plbossart At least in the capture case, there is again glitches in capture audio. In specific case of https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821 ... it's smaller than before and seems to be exactly one 1ms tick of audio missing:

image

kv2019i avatar May 21 '24 07:05 kv2019i

Looks the same to me? What am I missing @fredoh9 @marc-hb @kv2019i ?

@fredoh9 just explained this to me: NOCODEC is special.

Outside NOCODEC:

  • alsabat-playback: SOF playback, USB capture
  • alsabat-capture: USB playback, SOF capture

NOCODEC is special, it has no USB. So:

  • SOF playback, SOF capture.

So both tests do more or less the same thing on NOCODEC.

marc-hb avatar May 22 '24 00:05 marc-hb

Right, so can we skip the capture part to avoid unnecessary confusions?

plbossart avatar May 22 '24 15:05 plbossart

... it's smaller than before and seems to be exactly one 1ms tick of audio missing:

and its also suspiciously close to 0.5 seconds of audio. 0.5 second USB audio buffer size ???

lgirdwood avatar May 22 '24 16:05 lgirdwood

... it's smaller than before and seems to be exactly one 1ms tick of audio missing:

and its also suspiciously close to 0.5 seconds of audio. 0.5 second USB audio buffer size ???

There's no USB involved in Nocodec mode, it's an SSP loopback.

plbossart avatar May 22 '24 16:05 plbossart

Right, so can we skip the capture part to avoid unnecessary confusions?

"send patches"

marc-hb avatar May 22 '24 18:05 marc-hb

Just spotted a capture failure on HDA

-> moved to new #9164

https://sof-ci.01.org/sofpr/PR9151/build4789/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997

2024-05-22 10:38:13 UTC [REMOTE_COMMAND] alsabat -Phw:CODEC,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 997 -k 2.1
2024-05-22 10:38:14 UTC [REMOTE_COMMAND] alsabat -Chw:sofhdadsp,0 -c 2 -r 48000 -f S16_LE -F 997 -k 2.1
 FAIL: Peak freq too low 973.39 Hz
 FAIL: Peak freq too low 974.85 Hz
 FAIL: Peak freq too low 977.05 Hz
 FAIL: Peak freq too low 978.52 Hz
 FAIL: Peak freq too low 979.98 Hz
 FAIL: Peak freq too low 982.18 Hz
 FAIL: Peak freq too low 983.64 Hz
 FAIL: Peak freq too low 987.30 Hz
 FAIL: Peak freq too low 988.77 Hz
alsa-utils version 1.2.6

Entering capture thread (ALSA).
Get period size: 3000  buffer size: 24000
Recording ...
Capture completed.

BAT analysis: signal has 65536 frames at 48000 Hz, 2 channels, 2 bytes per sample.

Channel 1 - Checking for target frequency 997.00 Hz
Amplitude: 9878.5; Percentage: [30]
Detected peak at 973.39 Hz of 17.57 dB
 Total 17.6 dB from 973.39 to 973.39 Hz
Detected peak at 974.85 Hz of 17.64 dB
 Total 20.6 dB from 974.85 to 974.85 Hz
Detected peak at 977.05 Hz of 17.74 dB
 Total 22.4 dB from 977.05 to 977.05 Hz
Detected peak at 978.52 Hz of 18.62 dB
 Total 23.9 dB from 978.52 to 978.52 Hz
Detected peak at 979.98 Hz of 18.92 dB
 Total 25.1 dB from 979.98 to 979.98 Hz
Detected peak at 982.18 Hz of 18.84 dB
 Total 26.7 dB from 981.45 to 982.18 Hz
Detected peak at 983.64 Hz of 19.88 dB
 Total 27.5 dB from 983.64 to 983.64 Hz
Detected peak at 987.30 Hz of 20.82 dB
 Total 29.8 dB from 985.11 to 987.30 Hz
Detected peak at 988.77 Hz of 22.17 dB
 Total 30.5 dB from 988.77 to 988.77 Hz
Detected peak at 996.83 Hz of 34.94 dB
 Total 41.0 dB from 990.23 to 1004.15 Hz
 PASS: Peak detected at target frequency
Detected at least 10 signal(s) in total

Return value is -1003

marc-hb avatar May 22 '24 18:05 marc-hb

Just spotted a capture failure on HDA

different interface, different issue please

plbossart avatar May 22 '24 19:05 plbossart

@plbossart can I ask for the logs with ipc payload?

wszypelt avatar May 24 '24 08:05 wszypelt

In theory the dmesg log should already have the entire payload provided, an option must be missing in the CI setup. @fredoh9 @marc-hb @ssavati do we turn the BIT(11) on for sof_debug?

#define SOF_DBG_DUMP_IPC_MESSAGE_PAYLOAD	BIT(11) /* On top of the IPC message header
							 * dump the message payload also

plbossart avatar May 24 '24 13:05 plbossart

All LNL RVPs in SOF CI have the IPC dump enabled.

fredoh9 avatar May 28 '24 21:05 fredoh9

do we turn the BIT(11) on for sof_debug?

It's hit-and-miss because:

  • We haven't productized Ansible yet, so it breaks often when someone reserves a device or when setting up a new one.
  • We can't set bits one by one and we don't have "official" sof_debug values documented anywhere yet.

marc-hb avatar May 28 '24 23:05 marc-hb

According to @kv2019i :

sof_debug=1 for others, and 3073/0x0c01 for nocodec

Correct?

marc-hb avatar May 29 '24 04:05 marc-hb

need to keep bit(11) enabled.

  • nocodec mode 0xC01
  • Others 0x801

fredoh9 avatar May 29 '24 04:05 fredoh9

All LNL RVPs in SOF CI have the IPC dump enabled.

i have checked all BA devices also for IPC dump. its enabled with above values

ssavati avatar May 29 '24 06:05 ssavati

@fredoh9, if we want to have payload dump enabled on all configurations then 0xc01 for nocodec and 0x801 for others. It really depends what we want. As a sidenote: the payload printing deemed to be 'heavy' and kept disabled. It also alters timings (adds delay) which can hide races when we have less kernel print activity.

It would be better if we could get bpftrace working to get the IPC message dumps, but they will be without the kernel print provided contexts... I have this semi working implementation:

#!/usr/bin/bpftrace
   
struct sof_ipc4_msg {
	union {
		unsigned long header_u64;
		struct {
			unsigned int primary;
			unsigned int extension;
		};
	};

	unsigned long data_size;
	void *data_ptr;
};
 
BEGIN
{
	printf("SOF IPC4 message logging. Ctrl-C to end.\n");
}

kprobe:sof_ipc4_log_header {
	$msg = (struct sof_ipc4_msg *)arg2;

	if (arg3 == 1 && $msg->data_size != 0) {
		printf("%s : 0x%x|0x%x [data size:: %llu]\n", str(arg1),
		       $msg->primary, $msg->extension, $msg->data_size);
		if (!strcontains(str(arg1), "done")) {
			printf("Message payload:\n%rh\n", buf($msg->data_ptr,
			       $msg->data_size));
		}
	} else {
		printf("%s : 0x%x|0x%x\n", str(arg1), $msg->primary,
		       $msg->extension);
	}	
}

it prints all IPC messages but the payload printing will cut the print at BPFTRACE_MAX_STRLEN and so far I was not able to work that around to split up the payload print. There are limitations what we can do with bpftrace, if anyone can make this work then we can probably introduce an IPC log alongside of dmesg and the firmware log?

ujfalusi avatar May 29 '24 14:05 ujfalusi

IIRC we can increase the MAX_STRLEN. I worked on this months ago and of course don't remember any details. There was also a buffer dump option but that has some dependencies (edit: that was %rh which is already used above).

plbossart avatar May 29 '24 15:05 plbossart

@mathias-nyman has a suggestion to get a large string, see https://github.com/bpftrace/bpftrace/issues/305#issuecomment-1737129371

plbossart avatar May 29 '24 15:05 plbossart

I have this semi working implementation:

This seems very interesting - and 100% off-topic. Please file a new issue.

marc-hb avatar May 29 '24 17:05 marc-hb

@plbossart, OK, I have this now which can print up to 500 lines of 16 bytes out:

#!/usr/bin/bpftrace
   
struct sof_ipc4_msg {
	union {
		unsigned long header_u64;
		struct {
			unsigned int primary;
			unsigned int extension;
		};
	};

	unsigned long data_size;
	void *data_ptr;
};
 
BEGIN
{
	printf("SOF IPC4 message logging. Ctrl-C to end.\n");
}

kprobe:sof_ipc4_log_header {
	$msg = (struct sof_ipc4_msg *)arg2;

	if (arg3 == 1 && $msg->data_size != 0) {
		printf("%s : 0x%x|0x%x [data size:: %llu]\n", str(arg1),
		       $msg->primary, $msg->extension, $msg->data_size);
		if (!strcontains(str(arg1), "done")) {
			$count = (int64) $msg->data_size;
			$ptr = (uint8*) $msg->data_ptr;
			$line = 0;

			printf("Message payload:\n");
			while ($line < 500) {
				if ($count <= 16) {
					printf("%rh\n", buf($ptr, $count));
					break;
				}
				printf("%rh\n", buf($ptr, 16));
				$count -= 16;
				if ($count == 0) {
					break;
				}
				$ptr += 16;
				$line++;
			}
		} else {
			printf("\n");
		}
	} else {
		printf("%s : 0x%x|0x%x\n", str(arg1), $msg->primary,
		       $msg->extension);
		if (strcontains(str(arg1), "done")) {
			printf("\n");
		}
	}
}

I'm not sure how to format it, should we drop the * done and ipc tx reply : lines (the reply print indicates a fail for example)? Is it OK to print the payload as I do, as bytes?

Example output

sof-up-xtreme:[root]:~# ./ipc4_msg_trace.bt
Attaching 2 probes...
SOF IPC4 message logging. Ctrl-C to end.
ipc rx       : 0x1b080000|0x0
ipc rx done  : 0x1b080000|0x0

ipc tx       : 0x44000000|0x31400008 [data size:: 8]
Message payload:
b2 2e 86 eb 00 00 00 00
ipc tx reply : 0x64000000|0x31400008
ipc tx done  : 0x44000000|0x31400008 [data size:: 8]

ipc tx       : 0x44000000|0x3060004c [data size:: 76]
Message payload:
00 01 00 00 00 10 00 00 01 00 00 00 ef 01 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
ipc tx reply : 0x64000000|0x3060004c
ipc tx done  : 0x44000000|0x3060004c [data size:: 76]

ipc tx       : 0x11000005|0x0
ipc tx reply : 0x31000000|0x0
ipc tx done  : 0x11000005|0x0

ipc tx       : 0x40000004|0x15 [data size:: 84]
Message payload:
00 00 00 00 c0 00 00 00 80 01 00 00 01 00 00 00
80 bb 00 00 10 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 10 01 00 80 bb 00 00 20 00 00 00
10 ff ff ff 01 00 00 00 00 00 00 00 02 20 01 00
00 00 00 00 00 00 00 00 80 01 00 00 01 00 00 00
00 00 00 00
ipc tx reply : 0x60000000|0x15
ipc tx done  : 0x40000004|0x15 [data size:: 84]

ipc tx       : 0x40000006|0x10 [data size:: 64]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00 ff ff ff ff ff ff ff 7f
02 00 00 00 00 00 00 00 40 0d 03 00 00 00 00 00
ipc tx reply : 0x60000000|0x10
ipc tx done  : 0x40000006|0x10 [data size:: 64]

ipc tx       : 0x44000006|0x30000018 [data size:: 24]
Message payload:
ff ff ff ff ff ff ff 7f 02 00 00 00 00 00 00 00
40 0d 03 00 00 00 00 00
ipc tx reply : 0x64000000|0x30000018
ipc tx done  : 0x44000006|0x30000018 [data size:: 24]

ipc tx       : 0x40000002|0xa [data size:: 40]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00
ipc tx reply : 0x60000000|0xa
ipc tx done  : 0x40000002|0xa [data size:: 40]

ipc tx       : 0x1101000c|0x0
ipc tx reply : 0x31000000|0x0
ipc tx done  : 0x1101000c|0x0

ipc tx       : 0x40000003|0x1000a [data size:: 40]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00
ipc tx reply : 0x60000000|0x1000a
ipc tx done  : 0x40000003|0x1000a [data size:: 40]

ipc tx       : 0x40010006|0x10010 [data size:: 64]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00 ff ff ff ff ff ff ff 7f
02 00 00 00 00 00 00 00 40 0d 03 00 00 00 00 00
ipc tx reply : 0x60000000|0x10010
ipc tx done  : 0x40010006|0x10010 [data size:: 64]

ipc tx       : 0x44010006|0x30000018 [data size:: 24]
Message payload:
ff ff ff ff ff ff ff 7f 02 00 00 00 00 00 00 00
40 0d 03 00 00 00 00 00
ipc tx reply : 0x64000000|0x30000018
ipc tx done  : 0x44010006|0x30000018 [data size:: 24]

ipc tx       : 0x4000000a|0x1000a [data size:: 40]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00
ipc tx reply : 0x60000000|0x1000a
ipc tx done  : 0x4000000a|0x1000a [data size:: 40]

ipc tx       : 0x4400000a|0x30000058 [data size:: 88]
Message payload:
58 00 00 00 02 00 00 00 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff
ff ff ff ff 01 00 00 00 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 9e 73 13 20
00 00 00 00 b2 7f 00 00
ipc tx reply : 0x64000000|0x30000058
ipc tx done  : 0x4400000a|0x30000058 [data size:: 88]

ipc tx       : 0x4000000b|0x1000a [data size:: 40]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00
ipc tx reply : 0x60000000|0x1000a
ipc tx done  : 0x4000000b|0x1000a [data size:: 40]

ipc tx       : 0x4400000b|0x30000038 [data size:: 56]
Message payload:
38 00 00 00 02 00 01 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 ff ff ff ff 04 00 ff ff
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 40 00 00 00 00 00 00
ipc tx reply : 0x64000000|0x30000038
ipc tx done  : 0x4400000b|0x30000038 [data size:: 56]

ipc tx       : 0x40000010|0x1000a [data size:: 40]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 00 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00
ipc tx reply : 0x60000000|0x1000a
ipc tx done  : 0x40000010|0x1000a [data size:: 40]

ipc tx       : 0x44000010|0x3c80001c [data size:: 28]
Message payload:
00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x64000000|0x3c80001c
ipc tx done  : 0x44000010|0x3c80001c [data size:: 28]

ipc tx       : 0x44000010|0x3000006c [data size:: 108]
Message payload:
6c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 e8 00 00 00 1e
00 00 00 0c d3 4d 62 00 b1 c2 09 04 55 55 55 05
50 fa 1e 00 55 60 94 00 7e 98 6a ff 83 c9 fe 01
64 47 47 22 17 56 74 01 1c c7 71 00 77 77 77 ff
d8 77 1f 00 05 00 00 00 00 80 43 00 d7 7d 04 00
a0 ce 25 00 d7 7d 09 00 b1 b5 00 00
ipc tx reply : 0x64000000|0x3000006c
ipc tx done  : 0x44000010|0x3000006c [data size:: 108]

ipc tx       : 0x40010004|0x10015 [data size:: 84]
Message payload:
00 00 00 00 80 01 00 00 80 01 00 00 01 00 00 00
80 bb 00 00 20 00 00 00 10 ff ff ff 01 00 00 00
00 00 00 00 02 20 01 00 80 bb 00 00 20 00 00 00
10 ff ff ff 01 00 00 00 00 00 00 00 02 20 01 00
00 00 00 00 00 08 00 00 00 03 00 00 01 00 00 00
00 00 00 00
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x60000000|0x10015
ipc tx done  : 0x40010004|0x10015 [data size:: 84]

ipc tx       : 0x45000004|0x6
ipc tx reply : 0x65000000|0x6
ipc tx done  : 0x45000004|0x6

ipc tx       : 0x45000006|0x2
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x65000000|0x2
ipc tx done  : 0x45000006|0x2

ipc tx       : 0x45000002|0x3
ipc tx reply : 0x65000000|0x3
ipc tx done  : 0x45000002|0x3

ipc tx       : 0x45000003|0x10006
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x65000000|0x10006
ipc tx done  : 0x45000003|0x10006

ipc tx       : 0x45010006|0xa
ipc tx reply : 0x65000000|0xa
ipc tx done  : 0x45010006|0xa

ipc tx       : 0x4500000a|0xb
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x65000000|0xb
ipc tx done  : 0x4500000a|0xb

ipc tx       : 0x4500000b|0x10
ipc tx reply : 0x65000000|0x10
ipc tx done  : 0x4500000b|0x10

ipc tx       : 0x45000010|0x10004
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x65000000|0x10004
ipc tx done  : 0x45000010|0x10004

ipc tx       : 0x13010003|0x0
ipc tx reply : 0x33000000|0x0
ipc tx done  : 0x13010003|0x0

ipc tx       : 0x13010004|0x0
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x33000000|0x0
ipc tx done  : 0x13010004|0x0

ipc tx       : 0x13000003|0x0
ipc tx reply : 0x33000000|0x0
ipc tx done  : 0x13000003|0x0

ipc tx       : 0x13000004|0x0
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x33000000|0x0
ipc tx done  : 0x13000004|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx       : 0x13000003|0x0
ipc tx reply : 0x33000000|0x0
ipc tx done  : 0x13000003|0x0

ipc tx       : 0x13010003|0x0
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x33000000|0x0
ipc tx done  : 0x13010003|0x0

ipc tx       : 0x13000002|0x1 [data size:: 12]
Message payload:
02 00 00 00 01 00 00 00 00 00 00 00
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x33000000|0x1
ipc tx done  : 0x13000002|0x1 [data size:: 12]

ipc tx       : 0x46000002|0x3
ipc tx reply : 0x66000000|0x3
ipc tx done  : 0x46000002|0x3

ipc tx       : 0x12000000|0x0
ipc tx reply : 0x32000000|0x0
ipc tx done  : 0x12000000|0x0

ipc tx       : 0x12010000|0x0
ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc rx       : 0x1b060000|0x0
ipc rx done  : 0x1b060000|0x0

ipc tx reply : 0x32000000|0x0
ipc tx done  : 0x12010000|0x0

ipc tx       : 0x44000000|0x3060004c [data size:: 76]
Message payload:
00 01 00 00 00 10 00 00 00 00 00 00 ef 01 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
ipc tx reply : 0x64000000|0x3060004c
ipc tx done  : 0x44000000|0x3060004c [data size:: 76]

ipc tx       : 0x47000000|0x0 [data size:: 8]
Message payload:
01 00 00 00 00 00 00 00
ipc tx reply : 0x67000000|0x0
ipc tx done  : 0x47000000|0x0 [data size:: 8]

ujfalusi avatar May 30 '24 07:05 ujfalusi

Nice!

For the size, don't we have a limit of 4k per IPC?

Also we should ask our firmware friends if they have a preference for the format, if their tools can eat our traces directly that would be great.

plbossart avatar May 30 '24 13:05 plbossart

Just a note: given reproduction rate is low (so getting the logs without CI is not so easy) and this is a P1 item, can we enable the old-style IPC dump on the DUTs where the error is seen, so we can move this bug forward ASAP? UPDATE: I believe @ssavati has updated some of DUTs already, so we might be good already

kv2019i avatar May 30 '24 13:05 kv2019i

Nice!

For the size, don't we have a limit of 4k per IPC?

Also we should ask our firmware friends if they have a preference for the format, if their tools can eat our traces directly that would be great.

I think we might still have issue with the BPF way when the message payload is split across multiple messages, but not sure about that. No, we will print a separate message with it's own payload, so should be ok to feed into a validation system, if we can format things for it to consume.

ujfalusi avatar May 30 '24 13:05 ujfalusi

We have a winner, finally hit the bug on a machine with IPC payload dump enabled: https://sof-ci.01.org/linuxpr/PR5034/build3319/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-nocodec-32bits-599 (UPDATE: direct link to bat.wav -> https://sof-ci.01.org/linuxpr/PR5034/build3319/devicetest//LNLM_RVP_NOCODEC/check-alsabat-nocodec-32bits-599/bat.wav.heXg5I )

FYI @wszypelt @abonislawski

kv2019i avatar Jun 03 '24 10:06 kv2019i