linux icon indicating copy to clipboard operation
linux copied to clipboard

Add ipc4 timestamp support on MTL

Open RanderWang opened this issue 2 years ago • 53 comments

This patch add timestamp support on MTL

RanderWang avatar Jul 27 '22 08:07 RanderWang

@RanderWang I believe you would need the plumbing added in https://github.com/thesofproject/linux/pull/3792 to test this PR?

plbossart avatar Jul 29 '22 21:07 plbossart

@RanderWang I believe you would need the plumbing added in #3792 to test this PR?

@plbossart thanks. I also tested with driver modification. The get_time_info was called by snd_sof_pcm_period_elapsed to check data status since it would be called infinite times in every pcm pointer update. It was only to validate the function and I missed ASoC and ALSA.

I also want to know whether application will get time stamp by this interface and how frequent it will query? thanks!

RanderWang avatar Aug 01 '22 06:08 RanderWang

@RanderWang I believe you would need the plumbing added in #3792 to test this PR?

thanks. I tested with PR and the time log should be correct.

RanderWang avatar Aug 01 '22 08:08 RanderWang

I also want to know whether application will get time stamp by this interface and how frequent it will query? thanks!

that's application-dependent @RanderWang. First the application has to enable explicitly timestamping with software parameters.

	/* enable tstamp */
	err = snd_pcm_sw_params_set_tstamp_mode(handle_p, swparams_p, SND_PCM_TSTAMP_ENABLE);

	err = snd_pcm_sw_params_set_tstamp_type(handle_p, swparams_p, TSTAMP_TYPE);

The timestamp is given with the 'status' IOCTL and it could be send at a very high frequency or very low. The status is typically used when a period elapsed or when a timer expired. It's not uncommon for the status to be called multiple times so that the buffer is really filled as much as possible.

plbossart avatar Aug 01 '22 15:08 plbossart

updated, thanks!

RanderWang avatar Aug 03 '22 08:08 RanderWang

@plbossart updated for capture. One issue is that llp registger is for normal dma but HDA uses link dma, so we can't get llp info for HDA case. And in fw there is not such support for it. We need to get wclk and position info from host MMIO space. I tested HDA case and found no matched llp slot.

RanderWang avatar Aug 15 '22 07:08 RanderWang

@RanderWang for HDAudio the register is LPIB, you should be able to read it from the host?

plbossart avatar Aug 15 '22 08:08 plbossart

@RanderWang for HDAudio the register is LPIB, you should be able to read it from the host?

I will use intel HDA audio global synchronization registers, like https://github.com/thesofproject/linux/blob/topic/sof-dev/sound/pci/hda/hda_controller.c#L411-L422

		/* Read wall clock counter */
		wallclk_ctr = snd_hdac_chip_readl(azx_bus(chip), WALFCC);

		/* Read TSC counter */
		tsc_counter_l = snd_hdac_chip_readl(azx_bus(chip), TSCCL);
		tsc_counter_h = snd_hdac_chip_readl(azx_bus(chip), TSCCU);

		/* Read Link counter */
		ll_counter_l = snd_hdac_chip_readl(azx_bus(chip), LLPCL);
		ll_counter_h = snd_hdac_chip_readl(azx_bus(chip), LLPCU);

RanderWang avatar Aug 15 '22 08:08 RanderWang

This should be enough @RanderWang

                /* Read Link counter */
		ll_counter_l = snd_hdac_chip_readl(azx_bus(chip), LLPCL);
		ll_counter_h = snd_hdac_chip_readl(azx_bus(chip), LLPCU);

I still don't get how we are going to support multi-stream capture though, we don't have a start offset so we might be restricted to using only the host DMA counters and have no ability to report the delay when two FEs are connected to a single capture BE.

plbossart avatar Aug 15 '22 08:08 plbossart

This should be enough @RanderWang

                /* Read Link counter */
		ll_counter_l = snd_hdac_chip_readl(azx_bus(chip), LLPCL);
		ll_counter_h = snd_hdac_chip_readl(azx_bus(chip), LLPCU);

I still don't get how we are going to support multi-stream capture though, we don't have a start offset so we might be restricted to using only the host DMA counters and have no ability to report the delay when two FEs are connected to a single capture BE.

@plbossart That is a story of close source fw and we can add support in converged sof fw, how about it ?

RanderWang avatar Aug 15 '22 08:08 RanderWang

@RanderWang if you find a way to add this capability in a backwards-compatible manner that would be good indeed. Changes would probably have to be reviewed by @lgirdwood @mmaka1 and others since you would be changing the ABI and firmware support.

plbossart avatar Aug 15 '22 08:08 plbossart

@RanderWang if you find a way to add this capability in a backwards-compatible manner that would be good indeed. Changes would probably have to be reviewed by @lgirdwood @mmaka1 and others since you would be changing the ABI and firmware support.

sure. The first 4k page is reserved for fw_reg and now sizeof(FwRegisters) = 2656, so we can append some pipeline_reg for capture without affecting windows.

RanderWang avatar Aug 17 '22 05:08 RanderWang

update HDA support since fw doesn't support llp info for HDA. Read wclk and llp in host MMIO space.

RanderWang avatar Aug 17 '22 05:08 RanderWang

Sorry for the late comments to some topics, but looking at this series again, I found some new concerns. A few of the functions have grown really big and are getting hard to read now. There is now mixing of generic SOF additions that should be separate from IPC4 specific changes.

I also see more logic relying on usleep delays. How do we test this and deem correct operation? I think at minimum we need test cases to sof-test that run aplay with "--test-position" and preferrably something more robust.

Thanks. We use ./audio_time to test it. This is a built-in tool in alsa lib. the usage of audio_time: https://www.kernel.org/doc/html/latest/sound/designs/timestamping.html

RanderWang avatar Aug 22 '22 02:08 RanderWang

@plbossart @ujfalusi updated (1) remove loop to get llp (2) rework spcm structure definition (3) refine memory window reading (4) split the timestamp commit into small ones

RanderWang avatar Aug 23 '22 07:08 RanderWang

@plbossart Add delay info in topology. https://github.com/thesofproject/sof/pull/6180

RanderWang avatar Aug 24 '22 07:08 RanderWang

@plbossart I am struggling with pipeline latency calculation. In my opinion, pipeline latency is the input pointer - output pointer. In fw, it gets input pointer from the entry module and output pointer from exit module and calculate pipeline latency dynamically. How to implement it in kernel ? I don't have any idea.

RanderWang avatar Aug 24 '22 13:08 RanderWang

@RanderWang for playback the input pointer is DPIB, and the output pointer the LLP reported in firmware windows.

I don't understand what the issue is?

plbossart avatar Aug 24 '22 14:08 plbossart

@RanderWang for playback the input pointer is DPIB, and the output pointer the LLP reported in firmware windows.

I don't understand what the issue is?

yes, I did it in last PR. But I don't understand "in a minimal implementation but with a delay actually added in the topology! " How to get it by topology ? we can get pipeline info in topology but how to get these pointers with topology ?

RanderWang avatar Aug 25 '22 05:08 RanderWang

@RanderWang for playback the input pointer is DPIB, and the output pointer the LLP reported in firmware windows.

I don't understand what the issue is?

Another issue is : no LLP support in FW for HDA since it uses link dma, so my last PR reads llp from host memory space which is in the same domain of DPIB.

RanderWang avatar Aug 25 '22 05:08 RanderWang

@RanderWang for playback the input pointer is DPIB, and the output pointer the LLP reported in firmware windows. I don't understand what the issue is?

yes, I did it in last PR. But I don't understand "in a minimal implementation but with a delay actually added in the topology! " How to get it by topology ? we can get pipeline info in topology but how to get these pointers with topology ?

what I meant is increase the buffering on the copier host, and use the topology to specify that buffer size/duration.

plbossart avatar Aug 25 '22 07:08 plbossart

@RanderWang for playback the input pointer is DPIB, and the output pointer the LLP reported in firmware windows. I don't understand what the issue is?

yes, I did it in last PR. But I don't understand "in a minimal implementation but with a delay actually added in the topology! " How to get it by topology ? we can get pipeline info in topology but how to get these pointers with topology ?

what I meant is increase the buffering on the copier host, and use the topology to specify that buffer size/duration.

sure. It depends on multiple-queue patch and I will send a patch after it is merged

RanderWang avatar Aug 25 '22 08:08 RanderWang

what I meant is increase the buffering on the copier host, and use the topology to specify that buffer size/duration.

sure. It depends on multiple-queue patch and I will send a patch after it is merged

For my education, what is the dependency @RanderWang ?

plbossart avatar Aug 25 '22 08:08 plbossart

what I meant is increase the buffering on the copier host, and use the topology to specify that buffer size/duration.

sure. It depends on multiple-queue patch and I will send a patch after it is merged

For my education, what is the dependency @RanderWang ?

@plbossart Add deep buffer support in topology including adjusting copier host buffer size, max period size and d0i3 enabling. Since It will use mixin & mixout, multi-queue will be used.

RanderWang avatar Aug 25 '22 08:08 RanderWang

what I meant is increase the buffering on the copier host, and use the topology to specify that buffer size/duration.

sure. It depends on multiple-queue patch and I will send a patch after it is merged

For my education, what is the dependency @RanderWang ?

@plbossart Add deep buffer support in topology including adjusting copier host buffer size, max period size and d0i3 enabling. Since It will use mixin & mixout, multi-queue will be used.

For your internal tests, you could increase the copier host buffer size today, right?

I understand that for integration we need the mixin-mixout topology, a second stream added and then the increased buffer size added.

plbossart avatar Aug 25 '22 08:08 plbossart

For your internal tests, you could increase the copier host buffer size today, right?

yes, I can, with the buffer size change we can get the different delay value for validation ?

RanderWang avatar Aug 25 '22 08:08 RanderWang

@plbossart , my test result is: S16LE 2CH host copier dma buffer setting changes (1) dma buffer size = 384 (96 samples) delay = 3xx, (2) dma_buffer size = 768(192 samples) delay = 4xx (3) dma_buffer size = 1536 (384 samples) delay = 6xx (4) dma_buffer size = 3072 (768 samples) delay = 10xx

RanderWang avatar Aug 25 '22 09:08 RanderWang

@plbossart I will add D0ix change after this PR merged

RanderWang avatar Sep 16 '22 09:09 RanderWang

@RanderWang this PR is really a bit too heavy. Can you remove the 'get_time_info' stuff and just provide a minimal PR with just the support for the .delay? I find the whole section on wclk way to complicated, and we don't seem to need it for deep-buffer handling. Thanks

plbossart avatar Sep 16 '22 14:09 plbossart

@RanderWang this PR is really a bit too heavy. Can you remove the 'get_time_info' stuff and just provide a minimal PR with just the support for the .delay? I find the whole section on wclk way to complicated, and we don't seem to need it for deep-buffer handling. Thanks

updated. Validated on MTL. Thanks!

RanderWang avatar Sep 19 '22 08:09 RanderWang