sof icon indicating copy to clipboard operation
sof copied to clipboard

topology1: nocodec multistream capture

Open plbossart opened this issue 3 years ago • 33 comments

This is the first step to enable multi-stream capture as a reference for topology1 platforms. Before splitting streams, we first need to split the capture pipelines in two. The modified nocodec topology looks like this:

sof-tgl-nocodec

For reference, the existing topology was:

sof-tgl-nocodec

plbossart avatar Aug 11 '22 14:08 plbossart

Minor mistake with COMP_SCHED, now fixed but IPC error on TRIGGER_START with XTOS. Let's see if that happens with Zephyr as well ....

[   65.907869] snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[   65.907949] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump start ]------------
[   65.907951] sof-audio-pci-intel-cnl 0000:00:1f.3: DSP panic!
[   65.907952] sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[   65.907955] sof-audio-pci-intel-cnl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[   65.907959] sof-audio-pci-intel-cnl 0000:00:1f.3: status code: 0xdead006 (unknown)
[   65.908034] sof-audio-pci-intel-cnl 0000:00:1f.3: reason: runtime exception (0x6)
[   65.908036] sof-audio-pci-intel-cnl 0000:00:1f.3: trace point: 0x00004000
[   65.908038] sof-audio-pci-intel-cnl 0000:00:1f.3: panic at :0
[   65.908039] sof-audio-pci-intel-cnl 0000:00:1f.3: error: DSP Firmware Oops
[   65.908041] sof-audio-pci-intel-cnl 0000:00:1f.3: error: Exception Cause: LoadStorePIFDataErrorCause, Synchronous PIF data error during LoadStore access
[   65.908042] sof-audio-pci-intel-cnl 0000:00:1f.3: EXCCAUSE 0x0000000d EXCVADDR 0x00000014 PS       0x00060725 SAR     0x00000000
[   65.908046] sof-audio-pci-intel-cnl 0000:00:1f.3: EPC1     0xbe02b73e EPC2     0xbe0308cd EPC3     0x00000000 EPC4    0x00000000
[   65.908048] sof-audio-pci-intel-cnl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[   65.908050] sof-audio-pci-intel-cnl 0000:00:1f.3: EPS2     0x00060b20 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[   65.908052] sof-audio-pci-intel-cnl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000222
[   65.908055] sof-audio-pci-intel-cnl 0000:00:1f.3: stack dump from 0xbe231ac0
[   65.908057] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231ac0: 9e236200 9e236200 00000062 00000000
[   65.908060] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231ad0: fe03e1b8 be231b10 be231b40 0000000d
[   65.908062] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231ae0: be231ac0 be231ca0 00000000 00000000
[   65.908064] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231af0: 9e0fe600 00000000 00000000 00000001
[   65.908066] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231b00: 7e02b73e be231b40 9e0fe800 00000000
[   65.908067] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231b10: be02b73e be231b70 00000004 00000000
[   65.908069] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231b20: 00000001 9e0fe75c 9e0fe600 9e0fe75c
[   65.908071] sof-audio-pci-intel-cnl 0000:00:1f.3: 0xbe231b30: be02b65b be231d40 9e09e600 fe02b73e
[   65.908072] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump end ]------------

plbossart avatar Aug 12 '22 07:08 plbossart

With Zephyr I only get an IPC timeout, not DSP panic.

   29.457822] snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[   29.960132] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx timed out for 0x60040000 (msg/reply size: 12/12)
[   29.960143] sof-audio-pci-intel-cnl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
[   29.960147] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ IPC dump start ]------------
[   29.960157] sof-audio-pci-intel-cnl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000003 rirb 00
[   29.960162] sof-audio-pci-intel-cnl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[   29.960175] sof-audio-pci-intel-cnl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[   29.960178] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ IPC dump end ]------------
[   29.960181] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump start ]------------
[   29.960183] sof-audio-pci-intel-cnl 0000:00:1f.3: IPC timeout
[   29.960186] sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[   29.960194] sof-audio-pci-intel-cnl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[   29.960281] sof-audio-pci-intel-cnl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000

plbossart avatar Aug 12 '22 08:08 plbossart

I think I found the problem, it's a conceptual mistake IMHO.

For capture, the 'start' component for hw_params and prepare is the host and the propagation works upstream to the DAI.

When the boundary with a different pipeline is found, there are a couple of error checks, and unfortunately, the criterion for error check are based on the SINK for the new pipeline, which is of course an ENDPOINT_NODE for capture. This doesn't happen for playback because the SINK for the new pipeline is a DAI! I don't think these tests have any merit, removing them in hw_params and prepare removes the errors and capture starts normally. I think we would have had a similar error with 3 pipelines on playback, the pipeline in the middle would have started with an endpoint node and ended with an endpoint node.

For PR https://github.com/thesofproject/sof/pull/6131 I just removed the test for now in the updated code. @lgirdwood it would help quite a bit if you and your team could review this change. Thanks!

plbossart avatar Aug 15 '22 19:08 plbossart

https://sof-ci.01.org/sofpr/PR6131/build1189/devicetest/ is all good except for a TIMEOUT for APL-Up2-NOCODEC in suspend-resume tests. This is likely not introduced by this PR.

plbossart avatar Aug 16 '22 10:08 plbossart

SOFCI TEST

plbossart avatar Aug 16 '22 10:08 plbossart

Last update to include a demux on capture. The new topology is as follows:

sof-cnl-nocodec

Local tests show clean audio capture on PCM 0 and PCM4, with an issue however when pipelines are enabled concurrently. We must replicate the logic from the mixer and keep the demux active on capture as long as at least one client is active.

plbossart avatar Aug 16 '22 13:08 plbossart

Updates for stop seem to work in local tests...

plbossart avatar Aug 16 '22 15:08 plbossart

all tests ok so far, except for alsa-bat which fails due to a bad mixer combination. I don't know how many times I've asked that we stop setting mixer values manually with specific strings...

plbossart avatar Aug 18 '22 08:08 plbossart

Intel test plan 14732 started

plbossart avatar Aug 18 '22 08:08 plbossart

suggested sof-test fix in https://github.com/thesofproject/sof-test/pull/951

plbossart avatar Aug 18 '22 10:08 plbossart

Intel test plan 14732 started

The only major fail is for ADLP_RVP_NOCODEC_CI but that's a different topology. It seems like a known issue. @keqiaozhang @lyakh can you confirm if that platform was functional before this PR?

plbossart avatar Aug 22 '22 12:08 plbossart

Intel test plan 14732 started

The only major fail is for ADLP_RVP_NOCODEC_CI but that's a different topology. It seems like a known issue. @keqiaozhang @lyakh can you confirm if that platform was functional before this PR?

Not sure about before this PR, but in general I don't remember that platform failing recently. E.g. here https://github.com/thesofproject/sof/pull/6073 all tests passed.

lyakh avatar Aug 23 '22 07:08 lyakh

SOFCI TEST

plbossart avatar Sep 13 '22 09:09 plbossart

All sof-test dependencies should be fixed now with volumes reset to 0dB We can run this even with Zephyr in multi-core mode, with the caveat that secondary cores are not properly powered-down.

Functionally this should help with an IPC3/Zephyr baseline to compare against the same changes with IPC4/topology2 that we need to do.

plbossart avatar Sep 13 '22 09:09 plbossart

@plbossart not sure why cmocka is failing. Could be a test using a modified module. Can you check locally as the container could be out of date.

-- Found Git: /usr/bin/git (found version "2.37.2") 
-- SOF version.cmake starting at [20](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:21)22-08-22T12:07:35Z UTC
-- /home/runner/work/sof/sof is at git commit with parent(s):
commit 5850a672782a389f8953f103b5b65e23bc230ba8 44a5200c87625588f0028aa08d560e68f2b8dc82 67ad94d064f61504ec6bf45d4eb[21](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:22)dcb3823658d (HEAD, pull/6131/merge)
Merge: 44a5200 67ad94d
Author: Pierre-Louis Bossart <[email protected]>
Date:   Mon Aug [22](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:23) 12:03:49 2022 +0000

    Merge 67ad94d064f61504ec6bf45d4eb21dcb38[23](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:24)658d into 44a5200c876[25](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:26)588f00[28](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:29)aa08d560e68f2b8dc82
fatal: No names found, cannot describe anything.
CMake Warning at scripts/cmake/version.cmake:70 (message):
-- GIT_TAG / GIT_LOG_HASH : v0.0-0-g0000 / 5850a67
  git describe found / nothing starting with 'v'.  Shallow clone?
Call Stack (most recent call first):
  CMakeLists.txt:99 (include)


-- Source content hash: 44481a34. Note: by design, source hash is broken by config changes. See #3890.
-- Generated new /home/runner/work/sof/sof/build_ut_defs/apollolake_defconfig/generated/include/sof_versions.h
-- (Re-)generating /home/runner/work/sof/sof/build_ut_defs/apollolake_defconfig/generated/.config
   and /home/runner/work/sof/sof/build_ut_defs/apollolake_defconfig/generated/include/autoconfig.h
   from /home/runner/work/sof/sof/src/arch/xtensa/configs/apollolake_defconfig
warning: the int symbol CORE_COUNT (defined at src/arch/host/Kconfig:5, src/platform/Kconfig:[29](https://github.com/thesofproject/sof/actions/runs/2903713710/jobs/4620607867#step:3:30)9) has a non-int default MP_NUM_CPUS (undefined)
-- Done, future changes to /home/runner/work/sof/sof/src/arch/xtensa/configs/apollolake_defconfig
   will be IGNORED by this build directory! The primary .config
   file is now 'generated/.config' in the build directory.
CMake Warning at test/cmocka/CMakeLists.txt:117 (message):
  SKIP alloc test on HOST, built but not run
Call Stack (most recent call first):
  test/cmocka/src/lib/alloc/CMakeLists.txt:4 (cmocka_test)

lgirdwood avatar Sep 13 '22 13:09 lgirdwood

@lgirdwood I don't even know what cmoka is.

plbossart avatar Sep 13 '22 13:09 plbossart

@lgirdwood I don't even know what cmoka is.

There is a helper script ./scripts/run-mocks.sh which will run the mocks as local commnd line applications.

...snip...

96% tests passed, 2 tests failed out of 50
Errors while running CTest

Output from these tests are in: /home/runner/work/sof/sof/build_ut_defs/apollolake_defconfig/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Total Test time (real) =   0.08 sec

The following tests FAILED:
	 13 - mux_copy (Failed)
	 14 - demux_copy (Failed)
make: *** [Makefile:71: test] Error 8
Error: Process completed with exit code 2.

lgirdwood avatar Sep 13 '22 13:09 lgirdwood

@lgirdwood I have no idea what this test does. Git bisect points to this:

54f7179eb9d89e963853a73a6ae5594a118ee3d1 is the first bad commit
commit 54f7179eb9d89e963853a73a6ae5594a118ee3d1
Author: Pierre-Louis Bossart <[email protected]>
Date:   Wed Aug 17 09:43:05 2022 +0200

    mux: don't propagate prepare on capture if one of the sinks is active
    
    One more condition that didn't work for capture.
    
    Signed-off-by: Pierre-Louis Bossart <[email protected]>

 src/audio/mux/mux.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

but the test seg faults with it.

Running build_ut/test/cmocka/src/audio/mux/demux_copy
+ ./build_ut/test/cmocka/src/audio/mux/demux_copy
1..27
(helper.c:309) comp new 0x562128b55000U type 18 id 0.0
(mux.c:171) mux_new()
(mux.c:103) mux_set_values()
(ipc-helper.c:43) buffer new size 0x10 id 0.0 flags 0x0
(ipc-helper.c:43) buffer new size 0x10 id 1.0 flags 0x0
(ipc-helper.c:43) buffer new size 0x10 id 2.0 flags 0x0
(ipc-helper.c:43) buffer new size 0x10 id 3.0 flags 0x0
(ipc-helper.c:43) buffer new size 0x10 id 5.0 flags 0x0
(mux.c:681) mux_prepare()
not ok 1 - test_demux_copy_s16le_mask_0 Could not run test: Test failed with exception: Segmentation fault(11)Test setup failed
(helper.c:309) comp new 0x562128b55000U type 18 id 0.0

how do I debug further?

plbossart avatar Sep 13 '22 13:09 plbossart

@plbossart there will be a C mock for mux and demux in test/cmocka/src/audio. I would add some debug there and rerun the mocks.

lgirdwood avatar Sep 13 '22 14:09 lgirdwood

Cmoka doesn't set the components @lgirdwood, this will make the tests pass:

-       /* check each mux sink state on capture state */
-       if (dev->pipeline->source_comp->direction == SOF_IPC_STREAM_CAPTURE) {
+       /*
+        * check each mux sink state on capture state. The check on the pipeline
+        * is only needed for cmocka tests to run without segfault
+        */
+       if (dev->pipeline && dev->pipeline->source_comp->direction == SOF_IPC_STREAM_CAPTURE) {

would that work for you?

plbossart avatar Sep 13 '22 14:09 plbossart

@lyakh can you take a look thanks. @plbossart did valgrind give any pointers to the segfault ? i.e. do we need to put a fix in up the stack if its a higher level failure

I didn't run valgrind, just experimentally found out that dev->pipeline is NULL for cmoka tests and that any dereference will lead to a seg fault. I guess that's fair, it's a component test, it's not clear that it needs to be part of a pipeline for a unit test.

plbossart avatar Sep 14 '22 15:09 plbossart

@plbossart thanks for the great effort. I may not be able to fully understand the design only by code browsing. I have a few questions after looking into the implementation. IIUC the design concept is to slice the topology into multiple pipelines which have the joint component (MIXER/MUX) on one end at least. For example of the multi-stream capture you have shown above https://github.com/thesofproject/sof/pull/6131#issuecomment-1216640288, it is divided into 3 pipelines with index (2), (32), and (35). (please correct if I'm wrong)

  1. Pipeline (32) and (35) are PCM-ended so they can be triggered by IPC. How do we trigger pipeline (2) (and who)?
  2. AFAIK for ll-scheduler pipeline tasks with the same sched_comp will be put into one schedule together, e.g. pipeline (2), (32), and (35). Do we have these tasks sorted? From my perspective (2) should be prior to others, otherwise it may produce the input delay or phase shift (between (32) and (35)).
  3. I used to implement split-joined type pipeline in #5259 long time ago. Do you think it is achievable while multi-stream playback and capture are ready for support? (just out of curiosity, we don't have any use case for split-joined type pipeline now.)

johnylin76 avatar Oct 04 '22 03:10 johnylin76

@plbossart thanks for the great effort.

Thanks for the comments @johnylin76, you're the first one to ask the right questions :-)

I may not be able to fully understand the design only by code browsing. I have a few questions after looking into the implementation. IIUC the design concept is to slice the topology into multiple pipelines which have the joint component (MIXER/MUX) on one end at least. For example of the multi-stream capture you have shown above #6131 (comment), it is divided into 3 pipelines with index (2), (32), and (35). (please correct if I'm wrong)

  1. Pipeline (32) and (35) are PCM-ended so they can be triggered by IPC. How do we trigger pipeline (2) (and who)?

For capture, the trigger is propagated upstream within the firmware, so when you trigger a PCM the firmware will end-up starting the SSP as well. DPCM will take care of the codec parts that are beyond the scope of the firmware can control.

This comment is ONLY true for IPC3. For IPC4 the kernel driver will have to trigger each pipeline separately, the firmware shall not propagate state changes across pipelines. @lyakh has a patch for this.

  1. AFAIK for ll-scheduler pipeline tasks with the same sched_comp will be put into one schedule together, e.g. pipeline (2), (32), and (35). Do we have these tasks sorted? From my perspective (2) should be prior to others, otherwise it may produce the input delay or phase shift (between (32) and (35)).

There is a very large scheduling issue here that this PR does not even being to address. When we have multiple pipelines going into the same mixer, it's possible that one of the branches does not complete within the time allocated by the LL scheduling. In other words, it's silly to assume that anything based on mixers with processing added on branches can be handled with a 1ms slot. We'll have to revisit this but that's not a topology issue, rather a firmware scheduling one. @lgirdwood is aware of the issue.

  1. I used to implement split-joined type pipeline in topology: sof-adl-max98357a-rt5682-waves-2way: solution without changing waves wrapper code #5259 long time ago. Do you think it is achievable while multi-stream playback and capture are ready for support? (just out of curiosity, we don't have any use case for split-joined type pipeline now.)

I think it will lead to to the same sort of issues where a pipeline can end-up not being connected to a host or dai. There are way too many assumptions left and right of that nature that will be exposed by 'unusual' topologies.

plbossart avatar Oct 04 '22 07:10 plbossart

  1. AFAIK for ll-scheduler pipeline tasks with the same sched_comp will be put into one schedule together, e.g. pipeline (2), (32), and (35). Do we have these tasks sorted? From my perspective (2) should be prior to others, otherwise it may produce the input delay or phase shift (between (32) and (35)).

There is a very large scheduling issue here that this PR does not even being to address. When we have multiple pipelines going into the same mixer, it's possible that one of the branches does not complete within the time allocated by the LL scheduling. In other words, it's silly to assume that anything based on mixers with processing added on branches can be handled with a 1ms slot. We'll have to revisit this but that's not a topology issue, rather a firmware scheduling one. @lgirdwood is aware of the issue.

Do we expect that the branch tasks run in parallel during the schedule period? I mostly work on adl-004-drop-stable so the tasks still be processed one by one in my knowledge. It's easier to imagine the task flow for mixer streams in the latter case, while it will be complicated if parallel processing is supported.

  1. I used to implement split-joined type pipeline in topology: sof-adl-max98357a-rt5682-waves-2way: solution without changing waves wrapper code #5259 long time ago. Do you think it is achievable while multi-stream playback and capture are ready for support? (just out of curiosity, we don't have any use case for split-joined type pipeline now.)

I think it will lead to to the same sort of issues where a pipeline can end-up not being connected to a host or dai. There are way too many assumptions left and right of that nature that will be exposed by 'unusual' topologies.

Agree on that. The processing order is critical for split-joined type pipeline. And ideally it should be able to be implemented by uniting branches into one-way path as alternative. Let's rule this topic out.

johnylin76 avatar Oct 04 '22 09:10 johnylin76

3. AFAIK for ll-scheduler pipeline tasks with the same sched_comp will be put into one schedule together, e.g. pipeline (2), (32), and (35). Do we have these tasks sorted? From my perspective (2) should be prior to others, otherwise it may produce the input delay or phase shift (between (32) and (35)).

There is a very large scheduling issue here that this PR does not even being to address. When we have multiple pipelines going into the same mixer, it's possible that one of the branches does not complete within the time allocated by the LL scheduling. In other words, it's silly to assume that anything based on mixers with processing added on branches can be handled with a 1ms slot. We'll have to revisit this but that's not a topology issue, rather a firmware scheduling one. @lgirdwood is aware of the issue.

Do we expect that the branch tasks run in parallel during the schedule period? I mostly work on adl-004-drop-stable so the tasks still be processed one by one in my knowledge. It's easier to imagine the task flow for mixer streams in the latter case, while it will be complicated if parallel processing is supported.

These changes cannot be envisioned for a production branch. You will notice that these changes are only applicable to the nocodec test topology, as a preparation for future work on newer platforms. Multi-stream support is not in scope for ADL, only later, and this is only a 'pipe cleaner' to expose issues across the stack.

Again dealing with more complicated topologies with significant processing on different branches possibly exceeding the allocated MCPS in a LL 1ms slot will require changes in the firmware scheduling. We are only deal with a 'best case' here, with only volume control.

plbossart avatar Oct 04 '22 09:10 plbossart

@lgirdwood @plbossart to reply to several comments, made so far in this PR: I don't think handling of the source_c->source (and similar) pointer has anything to do with buffer_acquire() / buffer_release(). In most cases buffer_acquire() does just one thing: obtain a cached pointer for the object. That is it. No locking, no cache synchronisation. And buffer_release() is a NOP. Only if a buffer is shared (which is very rarely the case in our topologies, if ever), then also locking and cache synchronisation is performed. But even in that case the cached buffer API does nothing about .source and .sink pointers - those are completely under the application logic control.

lyakh avatar Oct 05 '22 06:10 lyakh

Let's try differently @lyakh.

What is the rule telling us when source_c->source or sink_c->sink can be safely dereferenced?

plbossart avatar Oct 05 '22 15:10 plbossart

Let's try differently @lyakh.

What is the rule telling us when source_c->source or sink_c->sink can be safely dereferenced?

@plbossart I don't think there are any special rules for those pointers - just as any normal pointers they can be safely dereferences as long as they're valid. Those are pointers embedded in the buffer structure. So code like

source_c = buffer_acquire(source);
source = source_c->source;
buffer_release(source_c);
if (source)
	source->state = new_state();

is in principle valid as far as buffer object management is concerned. What are the rules of handling source and sink component objects, or any other objects to which buffer points - is a different question. So it is up to the specific algorithm to know what and when to do with those referenced objects - how to protect them, which contexts to protect them from, etc.

lyakh avatar Oct 06 '22 08:10 lyakh

Let's try differently @lyakh. What is the rule telling us when source_c->source or sink_c->sink can be safely dereferenced?

@plbossart I don't think there are any special rules for those pointers - just as any normal pointers they can be safely dereferences as long as they're valid. Those are pointers embedded in the buffer structure. So code like

source_c = buffer_acquire(source);
source = source_c->source;
buffer_release(source_c);
if (source)
	source->state = new_state();

is in principle valid as far as buffer object management is concerned. What are the rules of handling source and sink component objects, or any other objects to which buffer points - is a different question. So it is up to the specific algorithm to know what and when to do with those referenced objects - how to protect them, which contexts to protect them from, etc.

Sorry @lyakh, the source/sink buffers are absolutely not related to the algorithm. They are described in topology and linked to the algorithm module. If those links are wrong and pointers invalid, what would each module actually do? I think we're really confusing perpetrator and victim here...

plbossart avatar Oct 06 '22 13:10 plbossart

What are the rules of handling source and sink component objects, or any other objects to which buffer points - is a different question. So it is up to the specific algorithm to know what and when to do with those referenced objects - how to protect them, which contexts to protect them from, etc.

Sorry @lyakh, the source/sink buffers are absolutely not related to the algorithm. They are described in topology and linked to the algorithm module. If those links are wrong and pointers invalid, what would each module actually do? I think we're really confusing perpetrator and victim here...

@plbossart sorry, I meant "algorithm" in the generic CS sense, not as our "processing algorithm module." As for "what to do" - I think I've seen such cases several times when those pointers were invalid. The only explanation I have so far is that it can happen when two pipelines are connected and then they're being stopped and torn down and sometimes it happens in such cases that while one pipeline is being dismantled the other one hits a dangling buffer. Probably the best would be to try and solve this issue centrally - if possible, but it might not always be possible. Buffers between pipelines always belong to one of the connected pipelines, right? And that is defined in the topology. So, in a topology like

                    /--- [buf1] - [pipe B] -
[pipe A] - [fork] <
                    \--- [buf2] - [pipe C] -

as long as buf1 belongs to pipe B and buf2 belongs to pipe C it should be possible to disconnect them atomically so that pipe A never sees those buffers dangling. Is this how our topology-2 topologies defined? But in extreme examples like

                        [pipe E] ---\
                                      > [fork2] - [pipe D]
                     /--- [buf1] ---/
[pipe A] - [fork1] <
                     \--- [buf2] - [pipe C] -

i.e. where a buffer connects two forks, whether you make buf1 belong to pipe A or to pipe D - there's always a possibility that it remains dangling. So the code for fork1 or fork2 will have to handle it. And handling it in this case probably should mean ignoring it - just as if there were no buffer there at all. Is that what you were asking about or have I misunderstood?

lyakh avatar Oct 07 '22 06:10 lyakh