lingua-franca
lingua-franca copied to clipboard
Multiports and Banks of Reactors
I'm implementing multiports and banks of reactors, and I've noticed some terminology differences. I think it would be a good idea to agree on terminology.
For ports defined as:
input[4] in:int;
do we call it a 'multiport', 'multiplex port', or 'array port'? I would like to argue against 'array port' because it could create confusion with this:
input in:int[4];
I lean slightly towards 'multiport' because it is shorter than 'multiplex port', but I don't feel strongly about this.
Should we refer to 4 as the 'size', 'length', or 'width'? I'm leaning towards 'width' because 'size' and 'length' are already widely used for ordinary arrays.
For reactors instantiated as:
foo = new[4] Foo();
do we call them 'bank of reactors', 'multi reactors', or 'array of reactors'? I'm leaning slightly towards 'bank of reactors' because we won't use the word 'bank' for anything else, I think.
Again, I'm leaning towards 'width' for the number of instances in a bank of reactors.
Next, in Christian's realization in Cpp, if Foo
has an id
parameter, then that parameter will be set to 0, 1, 2, or 3 for each of the instances created. Should we worry about id
becoming a sort-of reserved parameter name? We could pick something a bit less likely to conflict with other uses, like bank_position
or some such.
So to be concrete, I propose the following terminology:
- multiport
- bank of reactors
- width for both
- bank_position for the ID.
Comments? Your preferences?
I have identified an ambiguity in the multiports/reactor bank mechanism. Consider this:
a = new Foo();
b = new[4] Bar();
a.out -> b.in;
If a.out is not a multiport, then its output should presumably be broadcast to all the instances of Bar. If a.out is a multiport, then presumably each output should be sent to a distinct instance of Bar. So far so good.
But what if the multiport has width 3 instead of 4?
I will assume for now in this case that the fourth instance of Bar should have a dangling input and receive no input.
But what if the multiport has width 3 instead of 4?
It'd say this should be reported as an error in the LF compiler...
The multi ports and reactors as they work now are more a case study and not a fully thought through design. In particular, there are checks missing in the validator that identify such issues as you showed them above. We could either consider this an error as Marten suggested or produce a warning telling the programmer about the ambiguity. However, in the case of a warning we would probably need some additional syntax to enable the programmer to explicitly state what she wants without producing the warning.
About terminology: I think it would be preferable to use the same terms for both ports and reactors. I like 'multiport', but for 'multireactor' it is not clear by the name what this would mean. So I actually like 'bank of reactors'. Would it make sense to use 'bank of ports' or 'portbank' or 'bankport' for ports?
So far, we aren’t allowing multiport widths to be set by a parameter. We should fix that, but it may not be so easy. In view of that, I think it’s better to have some flexibility. It means every instance of a class has to have the same width. Think, for example, of a reusable Adder that adds up to 4 inputs.
If we issue a warning, then we have to define the behavior anyway. In this case, I would prefer to not issue the warning but instead clearly specify what the behavior will be. The docs will look odd if they say, “don’t do this, but if you do it anyway, this is the specified behavior...”. So I think it should either be an error, rejected by the validator, or allowed and documented.
Finally, if a reaction is triggered by an event on any channel of a multiport (bank of ports?), then the reaction really needs to test for presence of inputs anyway... so increased flexibility on widths doesn’t really increase the burden.
So I’m leaning towards allowing mismatched widths. The current implementation in C allows it, and the regression tests test for it.
The grammar allows an ArraySpec to be "ofVariableLength", denoted by '[]'. This is useful for types, but I don't see any use for this for multiports or banks of reactors. Am I missing something?
I think there are some valid use-cases for []
(But none of them are implemented at the moment). For instance, the width could be automatically inferred by the compiler. If we connect an output[4]
to an input[]
, then the compiler could infer the input width to be 4. This is particularly useful if we do not know in advance how wide the multi port should be. However, this might not be required if the width can be given by a parameter.
It could also be useful for implementing a deterministic print reactor or similar library reactors. We could allow something like this:
target Cpp;
reactor Print {
input[] print:std::string;
reaction(print) {=
for (auto& p : print) {
if (p.is_present()) {
std::cout << *p.get();
}
}
=}
}
main reactor Foo {
a = new A();
b = new B();
c = new C();
print = new Print();
a.print->print.print; // connects to print.print[0]
b.print->print.print; // connects to print.print[1]
c.print->print.print; // connects to print.print[2]
}
Finally, I think that ports with []
could be very useful for mutations. I think it shouldn't be too hard to allow dynamic adding and removing of connections.
Ah, yes, inferring widths would be a good thing. It turns out that width inference is a nontrivial problem. See this paper describing the problem and an (imperfect) solution. It is not clear whether there exists a perfect solution.
I am a bit stuck, facing two alternative and mutually incompatible syntax/semantics choices for connections between port, multiports, and reactor banks. The alternatives are:
-
Connections to multiports/banks of reactors are made to available channels in the order in which the connections are specified.
-
Connections explicitly specify which channels/ports connect to which channels/ports.
I don't think these can be mixed, so we must choose between one or the other. Option 1 has less syntax (more compact programs) and is defensible on the assumption that when you define a multiport or bank of reactors, which channel/bank position you connect to is not as important. Arguably, if it were important, you would have defined multiple ports rather than a multiport. Option 2 is more explicit, requiring more syntax, including possibly complex syntax for specifying ranges of channels and strides. It is defensible in cases where which channel/bank position you connect to must be explicitly controlled.
In more detail:
Let in
and out
represent ordinary ports and min
and mout
represent multiports. Let a
, b
, c
... represent ordinary reactors and ba
, bb
, bc
... represent banks of reactors. Consider this connection (under option 1):
a.mout -> b.in;
a.mout -> c.min;
a.mout -> d.in;
Under option 1, the meaning of this is that the multiport channels of a.mout
are used in order. For example, if a.mout_width == 5
and c.min_width == 3
, then the option 2 equivalent would be:
a.mout[0] -> b.in;
a.mout[1:3] -> c.min[0:2];
a.mout[4] -> d.in;
This is more explicit, but if which channel is used is not important, then this just clutters the design. I think it will be necessary to allow parameters, which will significantly complicate the implementation, as in:
a.mout[0] -> b.in;
a.mout[1: c_width + 1] -> c.min[0: c_width];
a.mout[4] -> d.in;
Notice the need for an expression c_width + 1
, and the need to check that c_width == (c_width + 1) - 1
, a check that may be hard to do at compile time (also, this adds arithmetic expressions to the syntax of LF, a really big addition).
For this example, I'm leaning towards option 1 because of its simplicity both in syntax and semantics. Moreover, I don't think any so many checks are needed.
There is one major subtlety which is how to handle mismatches between source and destination widths. My proposal is this:
Use output channels cyclically, stop when all destination channels have connections.
On any connection from a multiport output to anything (ordinary ports, multiports, and banks of reactors), if all destination channels have been supplied with a connection and there are still unused output channels, those channel outputs are dropped. Those outputs go nowhere.
If, on the other hand, not all destination channels have been filled and we run out of output channels, then we cycle around and start again from the output's channel 0. This has some advantages. For example, suppose a.mout_width == b.min_width == c.min_width == 3
. Now consider this connection:
a.mout -> b.min;
a.mout -> c.min;
The meaning of this is multicast. All three channels from a
go to both b
and c
, making this look like and behave like an ordinary connection between ports for the (probably common) case where all the widths are the same. Suppose the widths are not the same, e.g. a.mout_width == 4
and b.min_width == c.min_width == 2
. Then the above connection sends the first two channels to b
and the last two to c
. Of course, the allows some wacky connections as well. For example, if a.mout_width == 2
and b.min_width == 4
, then you can do this:
a.mout -> b.min; // 2 channels of a.mout go to channels 0, 1 of b.min.
a.mout -> b.min; // same 2 channels of a.mout go to channels 2, 3 of b.min.
I am leaning towards option 1 and have verified that it is implementable reasonably easily, more easily than option 2. Option 2 allows richer possible connections, particularly if we support strides, but any connection specifiable with option 2 is also realizable under option 1, possibly requiring more ports and/or intermediate reactors to handle routing of messages.
Banks of reactors would work much like multiports and can be combined with multiports under option 1. Under option 2, I'm not sure such combinations will be so easy. Consider:
a.mout -> b.in;
a.mout -> bc.min;
a.mout -> bd.in;
The same semantics would apply. Use output channels cyclically, stop when all destination channels have connections.
Comments?
For what it's worth, I would like to point out that the mechanism to handle mismatched port widths is not limited to option 1. It could happen in option 2 as well. However, arguably, mismatched ports happening in option 2 can be deferred to purely programming errors while mismatched ports in option 1 can be either intentional or by error.
For option 1, handling mismatched ports via a cyclical/drop policy will handle intentional mismatched widths but might hide programmer errors that in a highly parallel complex program might be very hard to debug (saying from experience).
I first would like to point out that the simplest solution to the problem is to disallow connections of multi ports or reactor banks of different widths. As far as I know, we don't have a use-case for connections of different widths at the moment and, therefore, we could easily defer this until we have a solid use-case. That said, I have some more constructive thoughts I would like to add to the discussion.
I am a bit stuck, facing two alternative and mutually incompatible syntax/semantics choices for connections between port, multiports, and reactor banks.
Are the two approaches really mutual exclusive? I think we should combine them! I will explain in the following what I mean by that.
I personally like code that clearly states what it does without the need for explanation in comments. I don't think that this is the case in your first approach.
a.mout -> b.in;
a.mout -> c.min;
a.mout -> d.in;
This certainly looks clean. But what does it do? When I read these lines of code, I need to gather more information in order to actually understand them. First, I need to know how the algorithm for creating connections decides which port is connected where. Second, I need to know if a
, b
, c
, d
are reactor banks and if a.mount
, b.in
, c.min
, d.min
are multi ports. Moreover, I need to know their width. Only when I then follow the algorithm step by step, I can reconstruct how the connections work. That is not very clean in my opinion.
I agree, however, that your other example is verbose and adds information that might already be clear from the context.
a.mout[0] -> b.in;
a.mout[1:3] -> c.min[0:2];
a.mout[4] -> d.in;
I don't agree though that it clutters the design, as a clean design must also be easily understandable.
I think the decision for one of the two approaches is a matter of style and I don't think that it is a decision that should be made by us language designers. We should allow the programmer to make a decision based on the context and on whether it is obvious what is happening or if it needs further explanation. So in my view, there is nothing wrong with allowing both approaches.
Now if we allow both: What would be the meaning of this example?
a.mout[1:3] -> c.min[0:2];
a.mout -> d.in;
Would d.in
be connected to a.mout[0]
or a.mout[4]
? I don't really know and I assume this is why you say that both approaches are mutual exclusive. However, we could add validator rules that forbid the combination of both approaches. We could simply require that if [...]
is used once on a bank or multiport, it has to be used on all references to this same bank or port.
About cyclic creation of connections
a.mout -> b.min; // 2 channels of a.mout go to channels 0, 1 of b.min.
a.mout -> b.min; // same 2 channels of a.mout go to channels 2, 3 of b.min.
This is mind bending! What would happen if the width of a.mout
is 3 and the width of b.min
is 4? I think this is exactly why we need the possibility to provide explicit indexes. While I think it is fine to have a reasonable default behavior, this cyclic approach goes beyond reasonable in my opinion. I think it is fine for the compiler to error out on corner cases and ask the programmer to provide explicit indexes.
About expressions
I think there is a mistake in your example and it should really be:
a.mout[0] -> b.in;
a.mout[1: c_width] -> c.min[0: c_width-1];
a.mout[4] -> d.in;
Also, it is not obvious that c_width
is the width c.min
and not c
. We could play a little trick here and introduce len
as a keyword so that we can write a.mout[1:len(c.min)]
I agree that expressions would significantly complicate things, but at the moment I don't see the need to introduce them.
I actually had an idea that would provide a compact, simple syntax and at the same time is explicit about all the connections being made.
What if we would allow concatenation of ports? I am not sure what the concatenation operator should be, but let's go with +
for now. With concatenation, Edward's example could be written as follows:
a.mout -> b.in + c.min + d.in;
This would have the same semantics as proposed by Edward. Iterate over all output ports and connect them one by one to the inputs going from left to right. We can even allow concatenation on both sides:
a.mout + b.mout -> b.in + c.min + d.in;
I think it would be sensible to require that the width of all ports on the left of ->
shall match the width of all ports on the right. Then there is no room for ambiguity or interpretation as there is a strict one to one mapping of outputs to inputs.
However, it would be reasonable to provide a mechanism to connect ports where the width does not match. We could do that by introducing a special stub
port. This is analogous to what some hardware modeling frameworks do. Consider an input port in
of width 6 and an output port out
of width 4. We could connect them like this:
out + stub[2] -> in;
This allows for arbitrary connection patterns while making it 100% clear what happens.
Consider the examples Edward gave for his cyclic connection mechanism. They could be expressed like this:
a.mout -> b.min + c.min; // width(a.mout) == width(b.min) + width(c.min)
and
a.mout -> b.min; // 2 channels of a.mout go to channels 0, 1 of b.min.
a.mout -> b.min; // same 2 channels of a.mout go to channels 2, 3 of b.min
There is a clean distinction of both cases in the syntax!
This is a great idea! It would be much easier to implement, I think. I got an implementation working for my option 1 proposal, which I pushed, and it's pretty horrifically complicated. I think it wouldn't be hard to modify (and simplify) to use this syntax instead.
However, your last example doesn't work:
a.mout -> b.min; // 2 channels of a.mout go to channels 0, 1 of b.min.
a.mout -> b.min; // same 2 channels of a.mout go to channels 2, 3 of b.min
It doesn't satisfy the requirement that the widths on both sides match. I suppose we could do this:
a.mout -> b.min + stub[2]; // 2 channels of a.mout go to channels 0, 1 of b.min.
a.mout -> stub[2] + b.min; // same 2 channels of a.mout go to channels 2, 3 of b.min
Validating this, however, could be a nightmare.
If stub has this meaning, then it's not just syntactic sugar.
Right, I misread the example while writing my comment.
I think a better solution would be this:
a.mout + a.mout -> b.min
I like it because it is quite literal about what is happening.
The problem with your example is that b.min
appears two times on the RHS of ->
. The validator would need to make sure that both lines indeed refer to different indexes of b.min
. I think this would bring us again close to 'pretty horrifically complicated'. The simpler solution would be to require that a multi input port may only occur once on the RHS of ->
(same as for regular ports)
Ah yes, good idea. If we proceed with this, can we get rid of the explicit indexes in connections, as in this example:
a.mout[0] -> b.in;
a.mout[1: c_width] -> c.min[0: c_width-1];
a.mout[4] -> d.in;
? I would prefer that we not provide multiple ways to accomplish the same thing, and to me, this syntax is much cleaner:
a.mout -> b.in + c.min + d.in;
The explicit constant numbers in the indexes bother me. They are very error prone (you even pointed out that my example had an error!). This new syntax is less error prone and errors are more easily caught.
As for the +
, perhaps :
or |
or ||
instead?
a.mout -> b.in : c.min : d.in;
a.mout -> b.in | c.min | d.in;
a.mout -> b.in || c.min || d.in;
The notation ||
is pretty common for "parallel composition".
Ah yes, good idea. If we proceed with this, can we get rid of the explicit indexes in connections, as in this example:
Yes, I don't see any need for that.
As for the +, perhaps : or | or || instead?
I like |
. We are using :
already for types, and using both .
and :
in combination in a single line is potentially confusing. As you point out ||
has other meanings in other languages (also logic or in C/C++). So I would be voting for |
.
I would prefer ..
(Lua) or &
(Ada, AppleScipt, Visual Basic) over |
.
I have the same problem with ..
as with :
. If you leave out the spaces, it is difficult to see what is happening:
a.mout -> b.in:c.min:d.in;
a.mout -> b.in..c.min..d.in;
Another option would be a plain comma ,
:
a.mout -> b.in,c.min,d.in;
My personal preference is still |
though:
a.mout -> b.in|c.min|d.in;
I'm also fine with ,
, ||
, or |
...
On the question of ||
being used in other languages, I think the use here would be strongly consistent with common usage. In formal methods, ||
is almost universally used to mean "parallel composition." It would have that same meaning here:
a.mout -> b.in || c.min || d.in;
means "feed data from a.mout
to b
, c
, and d
to execute in parallel." So I think this notations emphasizes the concurrency in LF and will resonate strongly with formal methods people.
The common usage of ||
for disjunction is also consistent. You can read the above of as "the output of a
goes to b
, c
, OR d
." In fact, we could introduce the notation:
a.mout -> b.min && c.min;
to mean "the output of a
goes to b
AND c
." This seems clearer than
a.mout -> b.min;
a.mout -> c.min;
particularly if these two lines are separated by several others.
So I'm leaning strongly towards ||
.
I guess I see it as a simple concatenation rather than parallel composition. However, I am also fine with using ||
.
It is an interesting idea to introduce a syntax for multicast (although I am confused by the connection to conjunction and disjunction, I don't really see it). I am just not sure what happens if both operators would be combined. Is it cleaner to write
a.mout -> b.min || c.min && (d.min || e.min)
or
a.mout -> b.min || c.min;
a.mout -> stub[2] || d.min || e.min; // assuming that b.min has width 2
?
If we introduce &&
, would we then forbid multiple occurrences of the same port on the LHS of ->
? If so, how could I accomplish this:
a.mout -> b.min;
a.mout || c.mout -> d.min;
?
All good points. Let's keep it simple and just offer ||
.
In converting our parallel execution test, Threaded.lf and ThreadedThreaded.lf, I discovered an awkwardness with the new syntax. In this test, we want to broadcast an ordinary output port from reactor a
to a bank of reactors t
. This has to be done as follows:
a.out || a.out || a.out || a.out -> t.in;
I do not see any way to parameterize the width of the bank now because the connection needs to enumerate the outputs. If we do this:
a.out -> t.in;
then we get a warning and only connection gets made.
One possible solution is that whatever is on the left side gets automatically iterated until it fills whatever is on the right and the warning is issued only if the width on the left does not divide the width on the right. Thoughts about this?
I think we need to issue a warning if the left hand side has more than one port and the widths don’t match. This way we are intentionally introducing a special notation for broadcast:
a.out -> t.in
Moreover, this eliminates unnecessary complications if the user doesn’t know about ||
. However, once ||
is used on the left hand side even once, I think it is okay to require the user to be explicit about the port connections.
As an extension to what I said previously, I think it might be a nice feature to assign the right most predicate on the left hand side as a broadcast port. For example, in
foo.out || bar.out -> t.in
bar.out
will be broadcasted to the rest of the ports in t
. This could be a really useful feature if it is well-documented.
Edit: A warning should still be issued in this case and the broadcast feature can be explained in that warning.
If we simply interpret a statement like a.out -> t.in
as "broadcast" then there is no way to specify "unicast" anymore. We could use an ellipsis to break the ambiguity. I.e:
a.out -> t.in // unicast
and
a.out || ... -> t.in // broadcast
Note that this would more natural with a comma:
a.out, ... -> t.in // broadcast
I agree with introducing a new notation to distinguish between "broadcast" and a normal connection ("unicast"). This can make the program more intuitive, so that people don't have to manually infer which type of connection it is.
I like the idea that emerged during the conference call. To record it here, first, we will change ||
to a comma. Then we can have:
a.out, b.out -> c.in, d.in;
which requires both sides to balance, or
(a.out, b.out)+ -> c.in, d.in;
which repeats the pattern on the left one or more times to match the width on the right (and requires the width on the left to divide the width on the right), or
a.out, b.out+ -> c.in, d.in;
which repeats b.out enough times to balance the right side (and requires an integer number of repetitions to get balance).
My original use case would be written rather simply as
a.out+ -> t.in;
This looks good! I only have one small remark: The +
in your last example does not immediately fall into the eye and could be missed easily. I am thinking that maybe we should enforce parenthesis in conjunction with +, to emphasize it visually:
(a.out)+ -> t.in;
But we could also leave that open for the programmer to decide.
I have implemented a limited form of this broadcast connection, documented here. I have also implemented support for after
in the C target, but not yet in Cpp. I'm going to go ahead and merge this branch with master. It turns out that I didn't really need the arithmetic expressions for widths in order to do this. Instead, I realized a limited mechanism where the width of the bank of delay reactors is inferred. This is possible because the width of the connection that is replaced by this bank is not inferred but rather is explicit. If we later introduce any form of width inference, this mechanism will also have to be enhanced.
I have discovered yet another subtlety with banks of reactors. In the example below, a multiport of width 2 sends to a bank of reactors, each of which has an ordinary port as input. Unfortunately, the reactions in the bank are triggered even if their sole trigger is absent. This means that a reactor that is put into a bank always has to test inputs for presence. This is rather unfortunate and goes against our language specification.
/**
* This test checks that a downstream reaction is triggered only if its trigger is present.
*/
target C {
timeout: 1 sec
};
reactor Source {
output[2] out:int;
state count:int(0);
timer t(0, 200 msec);
reaction(t) -> out {=
if (self->count++ % 2 == 0) {
SET(out[0], self->count);
} else {
SET(out[1], self->count);
}
=}
}
reactor Destination {
input in:int;
reaction(in) {=
if (!in->is_present) {
fprintf(stderr, "Reaction to a triggered even though all inputs are absent!\n");
exit(1);
}
=}
}
main reactor TriggerDownstreamOnlyIfPresent2 {
s = new Source();
d = new[2] Destination();
s.out -> d.in;
}
I don't see a clean way to fix this because there is no way to statically know which of the reactors in the bank gets sent a message.
An unclean way to fix might be to insert code at the start of every reaction that returns if none of the triggers is present. This will add overhead, particularly if the reactor is already testing for presence. Suggestions?