decision_context not set during signal-induced DecisionTask
Running the sample code given with a reference to the decision_context causes the DecisionTask to fail. The execution history shows DecisionTaskScheduled, DecisionTaskStarted but never DecisionTaskCompleted. Eventually the workflow will timeout. The cause is the decision_context resolving to nil.
Here is the modified code to reproduce:
require_relative '../../recipe_activities'
class WaitForSignalWorkflow
extend AWS::Flow::Workflows
workflow :place_order do
{
version: "1.0",
task_list: "wait_for_signal_workflow",
execution_start_to_close_timeout: 60,
task_start_to_close_timeout: 20,
}
end
activity_client(:client) { { from_class: "RecipeActivity" } }
signal :change_order
def initialize
@change_order_period = 30
@signal_received = Future.new
end
def place_order(original_amount)
timer = create_timer_async(@change_order_period)
wait_for_any(timer, @signal_received)
client.process(amount)
end
def change_order(amount)
puts workflow_id # raises exception, workflow_id calls decision_context.workflow_context..
@signal_received.set(amount) unless @signal_received.set?
end
end
Additionally, the workflow executor does not log the failure anywhere and simply blackholes failures in the signal-induced DecisionTasks.
:+1:
:pray:
+1
@ben-mays Can you provide the code you are using to run the worker/activity_worker/starter? I was getting a similar issue where I'd get DecisionTaskStarted but never DecisionTaskCompleted, and the workflow would apparently blackhole the error and timeout. Bumping to 3.1.0(the newest release, which for some reason is not in the gemfile for the samples repo) allowed it to properly raise the exception and let me see my error, and after adding a value to start_execution allowed it to go through correctly(I still get an error, but that's due to amount not being defined in the code snippet given)
@mjsteger sorry, we're actively moving functionality off of SWF as a result of this and numerous other issues that manifested themselves- long polling causing tasks to be scheduled on dead sockets, the decision/activity context not being set, a memory leak that won't go away. I'll leave the issue open for others that may have the same issue.
@ben-mays Do you have any literature you've written about these issues? Did you happen to use the JVM Flow framework as well or are these experiences solely based on the ruby version? Can you speak to what you've switched to (assuming custom-grown workflow management on-top of a message bus)?