graphql-anycable icon indicating copy to clipboard operation
graphql-anycable copied to clipboard

Race condition between execute_grouped and read_subscription causing all subscriptions of same fingerprint to fail receiving trigger update

Open thlacroix opened this issue 7 months ago • 1 comments

Describe the bug As mentioned in https://github.com/anycable/graphql-anycable/pull/51#issue-3069184673 , we've identified a race condition that occurs between when we find an existing subscription_id for a fingerprint in https://github.com/anycable/graphql-anycable/blob/v1.3.1/lib/graphql/subscriptions/anycable_subscriptions.rb#L100 and when we read the subscription data (eg to extract the query_string and other parameters) in https://github.com/anycable/graphql-anycable/blob/v1.3.1/lib/graphql/subscriptions/anycable_subscriptions.rb#L165 (eg from https://github.com/rmosolgo/graphql-ruby/blob/v2.5.4/lib/graphql/subscriptions.rb#L106). When the subscription data is read, it might already have been deleted / expired since we got the id and checked its existence.

Because of that, the event will not be executed for the fingerprint, and the result won't be sent to all other still valid subscriptions of the same fingerprint. In 1.2.0 it will just be ignored, and in 1.3.1 it will be processed with empty data which sends error like "No query string was present" to the client (cf my PR above).

Versions ruby: 3.4.3 rails: 8.0.2 graphql: 2.5.4 graphql-anycable: 1.3.1 & 1.2.0 (different behaviors as mentioned above) anycable: 1.5.1

Steps to reproduce With high load (high number of subscriptions on the same fingerprint and high number of trigger events) and a configured subscription_expiration_seconds, clients are receiving "No query string was present" and not receiving trigger updates. We've also added a log before our schema execute to detect when we get and empty query in case of subscription trigger

  def self.execute(query_str = nil, **kwargs)
    if kwargs[:subscription_topic].present? && query_str.nil? && kwargs[:query].nil?
      Rails.logger.error('Nil query string for subscription graphql execution')
    end

Expected behavior All existing subscriptions for a fingerprint should receive their executed triggered event update, even if some of them have expired.

Actual behavior Receiving "No query string was present" errors from the clients and not executing the fingerprinted query for existing subscriptions, when a race condition occurs.

Additional context I understand that there are some back and forth between this lib and the graphql-ruby lib that defines the interface and some plumbing (eg https://github.com/rmosolgo/graphql-ruby/blob/v2.5.4/lib/graphql/subscriptions.rb#L104 & https://github.com/rmosolgo/graphql-ruby/blob/v2.5.4/lib/graphql/subscriptions.rb#L60) which might make it difficult to fix it from this side only; maybe @rmosolgo could have some ideas as well, I might open an issue on the graphql-ruby side as well if it can help.

thlacroix avatar May 16 '25 15:05 thlacroix

Thanks for the report!

We'll take a look at this next week.

palkan avatar May 16 '25 16:05 palkan