elsa-core icon indicating copy to clipboard operation
elsa-core copied to clipboard

Wrong IsLatest value due to exceptions during draft saving in Elsa 2

Open alessandro-gandolfi-ors opened this issue 1 year ago • 4 comments

Since the function to save a new draft doesn't use transactions (as far as I understood, I could be wrong), sometimes it can produce bad situations on the DB, where the "latest" workflow definition has already been changed with IsLatest = 0 but the new draft hasn't been created due to an exception (for example, a bad connection and/or timeout).

The version of Elsa 2 I'm using is 2.11.0 but I have checked and no other changes have been made to the files involved since that version to 2.14.1.

After this error, that definition can't be modified anymore, since there are no definitions with IsLatest = 1 and Elsa can't figure out the new version that should be used, trying to use 1 as new version and not respecting the index IX_WorkflowDefinition_DefinitionId_VersionId on the DB (version 1 is already present).

The first few lines of the exception are usually like this, I just replaced the DefinitionId not to make confusion:

 An unhandled exception has occurred while executing the request.
       Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while saving the entity changes. See the inner exception for details.
        ---> Microsoft.Data.SqlClient.SqlException (0x80131904): Cannot insert duplicate key row in object 'Elsa.WorkflowDefinitions' with unique index 'IX_WorkflowDefinition_DefinitionId_VersionId'. The duplicate key value is (XXXXXXXXXXXXXXXXXXXXXXX, 1).
       The statement has been terminated.

This has happened multiple times, sometimes for bad connection to the DB and other times for reason I couldn't understand from the logs, even when there were no errors regarding the connection.

Screenshot 2024-03-28 110347 image

In the last image there is an example of the versions of a definition where the published version has been modified, so the BE tried and succeded in updating the IsLatest flag to false, but then the new draft hasn't been created.

I have seen that a PR has been merged (#3752) to invert the order of the operations for Elsa 3, backporting the changes to Elsa 2, if possible, could help a lot, even without transactions.

So, if I understood correctly, these operations should be atomic but transactions have not been used. Is there a reason behind this choice or have I the wrong idea about it?

Thanks in advance.

alessandro-gandolfi-ors avatar Mar 28 '24 11:03 alessandro-gandolfi-ors

The issue you're describing seems to be related to a concurrency problem in the Elsa Workflow definitions update process. The error message indicates that there is an attempt to insert a duplicate key row into the WorkflowDefinitions table, which violates the unique index constraint IX_WorkflowDefinition_DefinitionId_VersionId.

This kind of problem can occur when the operation to update the IsLatest flag and the creation of a new draft are not atomic, and an exception occurs between these two operations. If the IsLatest flag is set to false for the current latest version, but the new draft is not created due to an exception, you end up in a state where there is no "latest" version, which causes subsequent operations to fail.

As for why transactions are not used, it's hard to say without more context. Transactions are a common way to ensure atomicity of operations, so their absence might be an oversight or a design decision based on other factors such as performance considerations or the underlying database support for transactions.

To address this issue, you mentioned a pull request (PR #3752) that inverts the order of operations for Elsa 3. Backporting such changes to Elsa 2 could indeed help to mitigate the issue by ensuring that the new draft is created before the IsLatest flag is updated. However, the best solution would be to wrap these operations in a transaction to ensure that either both operations succeed or neither does, maintaining database consistency.

If you are experiencing this issue frequently, it might be worth raising it on the Elsa GitHub repository as a bug or enhancement request, if you haven't already. The maintainers of the project would be best placed to explain the design decisions and potentially address the issue in a future release.

In the meantime, as a workaround, you might consider manually handling the transaction in your code when updating workflow definitions, if the Elsa API allows for it.

glime-ai[bot] avatar Mar 28 '24 11:03 glime-ai[bot]

Elsa 3 has mastered this logic flawlessly. You might consider backporting the logic from there and submitting a pull request.

sfmskywalker avatar Apr 05 '24 19:04 sfmskywalker

I'm asking my boss if I can spend time working on it, it would help us but there is already the idea to migrate to elsa 3 in the future...

Meanwhile, could you elaborate a bit on why those operations are not atomic?

alessandro-gandolfi-ors avatar Apr 10 '24 09:04 alessandro-gandolfi-ors

Yes, the abstraction of the persistence layer—encompassing EF Core, MongoDB, Dapper, and possible third-party implementations—makes handling this from the application layer quite practical, but the downside is the lack of support for performing this as an atomic operation. That being said, this issue is no longer present in Elsa 3, so odds are that if we back port that logic to Elsa 2, the issue goes away.

sfmskywalker avatar Apr 10 '24 09:04 sfmskywalker