data-api-builder [Enhancement]: gql operations are not atomic.

trafficstars

What happened?

Imagine a mutation like this:

mutation scenario {
  createbook(item: { id: 1, title: "Test", publisher_id: 1234}) {
    title
  }
}

This creates one book with id equal to 1.

Now imagine a mutation like this:

mutation scenario {
  createbook(item: { id: 1, title: "Test", publisher_id: 1234}) {
    title
  }
  createbook(item: { id: 2, title: "Test", publisher_id: 1234}) {
    title
  }
}

This creates two books with ids equal to 1 and 2.

Now imagine a mutation like this:

mutation scenario {
  createbook(item: { id: 1, title: "Test", publisher_id: 1234}) {
    title
  }
  createbook(item: { id: 1, title: "Test", publisher_id: 1234}) {
    title
  }
}

This creates one book with id equal to 1 and also a failure.

The nature of the problem is that the failure is unpredictable to the developer and the rollback is (or could be) egregiously difficult, especially if there are after triggers or CDC or Temporal ops, etc. involved.

This problem gets a lot worse now that we support multiple data sources.

Imagine a mutation like this:

mutation scenario {
  createbook(item: { id: 1, title: "Test", publisher_id: 1234 }) {
    title
  }
  createinventory(item: { id: 1, quantity: 123 }) {
    quantity
  }
}

Suppose that createbook fails and createinventory does not fail. If one exists in an on-prem database and another exists in a cloud database (just as an example) the sheer pain to the developer and the sheer complexity required to undo the unexpected error is very high. I say it is too high in some cases. As a result, one of the key benefits of graph is lost. The ability to group operations is possible but is so dangerous that it makes far more sense to send one operation at a time. This issue escalates when we start supporting nested mutations.

Solution

I believe our current implementation is incomplete and perhaps even flawed because it does not support it. I realize we could easily disagree if this is a bug or not. But this can be resolved with a very small change. I believe each sub operations can defer the commit of their individual transactions until every operation has successfully completed. At that time, every op can commit or every op can rollback based on a new configuration setting:

{ 
    "graphql": { 
        "atomicity": "true" 
    } 
}

Aside: atomicity is the technical term, but we might prefer something less technical like allornone.

The default of this value seems like it should be true.

Here's a screen showing how frustrating this would be to a developer:

Version

All versions

What database are you using?

Azure SQL

What hosting model are you using?

Local (including CLI)

Which API approach are you accessing DAB through?

GraphQL

Relevant log output

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Oct 05 '23 21:10 JerryNixon

I'm unable to find any explicit guidance on running multiple operations of a mutation request in a single transaction in the GraphQL Mutation Specs.

Current Behavior:

Consider a graphQL mutation such as

mutation{
    
    createBook(id: 1, publisher_id: 1234, title: "Book title")
    {
         id
         title
         publisher_id
    }
   
    createBook(id: 2, publisher_id: 1234, title: "Book title")
    {
        id
         title
         publisher_id
    }
}

For processing such a request, separate transactions are created for each createBook operation. For a single createBook operation, there are multiple database queries executed and all of these database queries run in a transaction.

If there are multiple operations in the same mutation request and there is some dependency between them, that would be business logic. Client application seems to be the right place to handle that rather than DAB making an assumption that these are related and thereby executing them in a transaction.

Things to consider

Here are some things to consider when executing multiple operations of a mutation request in a transaction.

Users might want to execute operations that are unrelated to each other.
For example: There could be createBook and createStock operations in the mutation request. There is no dependency between these two and success/failure one operation should not have any effect on the other. But, by setting atomicity: true, they would lose this ability.
When there are related operations, they would be business logic and client application seems to be the better place to handle them. Also, executing such dependent operations in the same mutation request does not seem like an ideal choice.

For example,

   mutation{
       createBook(...)
       updateInventory(...) 
   }

In this example, we would want to update the inventory only when the createBook succeeded. Maybe in some cases, it could be okay to leave the books table as is even if the updateInventory does not succeed.

These would be business logic and IMO; client application would be the best place to handle this.

It seems intuitive to let the client application handle dependent operations in a way that is needed for their business logic.

If there is a single long running operation in the mutation request, it could cause locking of resources in other tables (or even other databases now that we support multiple database types), since the other operations would wait for the result of this operation to commit/rollback.
The graphQL result, through the errors section lets the client application know the exact list of operations that failed. If the operations do not run in a transaction, retrying (this depends on the exact cause of failure, for this discussion, though, I'm taking the case of an intermittent failure where retyring can cause the operation to succeed) only the failed operations would be sufficient. But, when they are run in a transaction, the successful operations would also get rolled back causing the client application to retry all the operations again.

My thoughts: Considering all of these points, it seems like we shouldn't be encouraging the execution of the multiple dependent operations (where the success/failure of one operation has an effect on all the other operations). [These are my initial thoughts, at the moment, based on reading the issue. I'm open to other ideas/suggestions] Please kindly share info on scenarios where it would be beneficial to execute all the operations of a mutations in a transaction (preferably cases where it would be better to handle this at DAB instead of the client application).

Oct 09 '23 17:10 severussundar

Considering all of these points, it seems like we shouldn't be encouraging the execution of the multiple dependent operations (where the success/failure of one operation has an effect on all the other operations).

This can't be how we approach it. We can't say "we allow it" but "we don't like it" at the same time. If there is a fundamental problem with it, we need to remove the capability. If there is not, then we need to support it. The easy way to support it is with atomicity as described above.

Oct 10 '23 21:10 JerryNixon

Do we want to introduce a fail-early mechanism as part of this workstream so that if the first mutation in a list of mutations fails, the others do not get a chance to execute because "all or nothing" can't be honored?

I re-read the Mutation section of the GraphQL specification that @severussundar linked and it specifically states mutations are executed serially. I do acknowledge that the specs do not state whether the mutations should/shouldn't be atomic.

It is expected that the top level fields in a mutation operation perform side-effects on the underlying data system. Serial execution of the provided mutations ensures against race conditions during these side-effects.

Reference: GraphQL Specification Oct 2021 Mutation

This is also called out on graphql.org/Learn

A mutation can contain multiple fields, just like a query. There's one important distinction between queries and mutations, other than the name:

While query fields are executed in parallel, mutation fields run in series, one after the other.

This means that if we send two incrementCredits mutations in one request, the first is guaranteed to finish before the second begins, ensuring that we don't end up with a race condition with ourselves.

Since mutations will not execute concurrently, we can fail the request after the first mutation failure.

Oct 11 '23 19:10 seantleonard

This would be useful for us, we are currently achieving the same by performing multiple mutations within a stored proc, but using DAB straight would be preferable longer term :) (Currently we are looking at deleting multiple related items without leaving orphans)

Apr 18 '24 08:04 pingu2k4

Wait for customer to identify this issue is a priority again

Sep 06 '24 16:09 JerryNixon

data-api-builder data-api-builder copied to clipboard

[Enhancement]: gql operations are not atomic.

What happened?

Version

What database are you using?

What hosting model are you using?

Which API approach are you accessing DAB through?

Relevant log output

Code of Conduct

data-api-builder
data-api-builder copied to clipboard