Nodes with no inputs has empty events stream
If a node does not specify any inputs, it will have no event stream. And the event stream is always returning None.
This is however misleading as we might want to check the event stream for other event than INPUT event such as Stopping event.
Quick example
use std::time::Duration;
use dora_node_api::{self, dora_core::config::DataId, DoraNode, Event, IntoArrow};
fn main() -> eyre::Result<()> {
println!("hello");
let output = DataId::from("random".to_owned());
let (mut node, mut events) = DoraNode::init_from_env()?;
if events.recv_timeout(Duration::from_secs(100)).is_none() {
eprintln!("Evemt is None");
}
Ok(())
}
With an empty input dataflow description, will return immediately instead of timing out on 100s.
Our current behavior is that an event stream is closed as soon as all of its inputs are closed. This way, we enable dataflows to finish "naturally" without requiring a manual stop.
The "stop" event is only sent to signal a premature exit caused by a manual stop operation.
When a node has no inputs, it will never receive any input events. So there is no way to prematurely stop the input stream, as it's already finished from the start. In other words, a node with no inputs always finishes "naturally" even if a manual stop command is sent because the manual stop always happens after the last input has already been closed.
We can of course discuss whether we want to change this behavior. However, if we always wait for a manual stop command, it's no longer possible to have dataflows that finish "naturally".
Could you give more details on your use case for a node without inputs?
Sure thing. Basically, I have some source nodes that generates output without any input.
I guess, the thing is I wish to make them stop when I click on Ctrl+C but I don't have any easy way to do that at the moment.
We don't have to fix this just now.
Some ideas how we could support this use case:
- Add a some sort of dummy input to the source node that never fires. We could e.g. provide a built-in
dora/neverinput for this use case. - Add a config key to allow nodes to wait for manual stop, e.g. via the YAML file or as argument to
init_from_env. - Add a new
EventChannelClosedevent that is sent immediately before the stream returnsNone. This event could contain a manual stop channel so that it is possible to wait for manual stop commands even after the event stream returnedNone.
We're also using a number of input-only nodes, and this issue makes them somewhat difficult to shut down cleanly. Our current workaround is to use a dora/timer with a long timeout and ignoring those messages when they are sent.
My two cents:
I tend to agree that the current behavior is surprising and can easily trip people over when None is unexpectedly returned.
I am not sure the distinction between a graceful STOP and "all input streams have been closed" is worth the above friction.
I also think that if a node doesn't have any inputs, it's reasonable to add a synthetic dora/never input so it won't stop immediately. It's likely not what people expect.
In our project, we use a small wrapper that raises a StopIteration exception when STOP is received and only returns INPUT events, so if we have nested event loops (for sub-behaviors), they don't need to handle STOP everywhere:
for event in safe(node):
match event["id"]:
...
The event iterator will finish as soon as there are no more events available for this node. The stop event is only sent when a dataflow is stopped early by an external command (ctrl+c or dora stop). It is not sent when the dataflow finishes on its own (e.g. a source node that sends 10 outputs and then exits).
So if you do not require special handling of early stops, you should be able to just ignore the stop event. The iterator will finish afterwards anyway.
I can see how this behavior can be surprising, though. The stop event is currently a stop_sent_before_inputs_were_closed event, which is not obvious. I'm open to changing this behavior in general if we find a good alternative.
I guess we need to decide on the following questions:
- Should the
eventsiterator finish at some point or do we always want to require a manualbreak?-
My opinion: It's easy to forget the manual
break. If the iterator never finishes this would lead to an endless loop, which would have to be killed after a timeout ondora stop. I suspect that this would be a common mistake, so I'm in favor of having a natural end of the iterator. It's much nicer to write too because you can just handle the events that you're interested in and ignore the rest.
-
My opinion: It's easy to forget the manual
- Do we want to keep dataflows that finish on their own, i.e. without a manual stop?
- My opinion: Dataflows that finish on their own can be useful, so it's worth keeping this functionality. For example, you might want to do some processing work on a folder of images. It's nice if the dataflow is able to finish on it's own once all images have been processed.
- Are we okay with treating source nodes differently than other nodes? Or should we continue to treat all nodes the same?
- My opinion: I don't think that we should special-case source nodes and let them behave differently. I think it would be a huge footgun if adding an input changed the stop behavior.
Unfortunately I don't have any good proposal that satisfies all my answers to the above questions. So I guess we have to compromise somewhere...
I guess with good documentation, I would be fine with adding special treatment for source nodes (i.e. question 3). A synthetic input is a good idea, but it comes with drawbacks. For example, a dora/never input requires a manual break. A dora/stop input that closes after the stop would avoid this, but then it's unclear whether it will be sent as a Event::Input or an Event::Stop type.
How about an additional wait_for_stop: bool field that defaults to true for source nodes and false for nodes with inputs?
We could document it like this:
wait_for_stopThe
wait_for_stopfield controls whether the event stream should stay open until astopevent is received. Stop events are normally triggered by adora stopcommand orctrl-c. Ifwait_for_stopisfalse, the event stream closes after all inputs streams are closed (i.e. the corresponding outputs finished).The default value for
wait_for_stopis:
- Nodes with no inputs (so-called "source nodes") default to
wait_for_stop: true.- Nodes with inputs default to
wait_for_stop: false.
What do you think?
If this task is not urgent, I would like to try solving this problem as I feel it would be a great starting point for my contribution to Dora. @haixuanTao @phil-opp