AliceO2
AliceO2 copied to clipboard
Need to clarify how to handle multiple data sets over the same channel in DPL
some first ideas on that topic
User story: a processor defines one channel. In one single invocation of the processing callback it want's to send multiple data sets of identical OutputSpec
over the same channel, e.g. via multiple calls to snapshot
or adopt
. The receiver might expect theses messages to arrive in the same InputRecord
of one single invocation of its processing function.
There are different use cases.
- Event-like: Many users might expect an event-like entity, i.e. everything I send in one execution of the processing callback I expect to be accessible together in one single execution of the processing callback of the receiver
- Streaming: everything what gets into the processor is immediately processed without relating it to other data.
Probably these different use cases need to be clearly expressed in the workflow definition. And definitely something we have to do clarify to avoid confusion.
Should you use subspecification for that?
Not necessarily. One can think of sending multiple parts with the same OutputSpec
. Maybe it's not very likely and we do not need to support this right away. Right now it's possible to do so, and there is a potential misunderstanding on the receiver side.
We can make a policy requiring unique OutputSpec
per object and processor call, and can check this on the framework level. I think this would be the easiest measure to catch problematic cases.
If we agree on that we can label this issue as bug and make a patch soon to check for duplicate OutputSpecs
.
I think actually that the fact we allow it right now is a bug. As you point out it would become impossible for the receiving side to decide how many messages to wait for, so I do think that "multiple parts" should really mean "multiple subspecifications" (or maybe a newly introduced field, if you see that for detector specific stuff?).