graphql-ruby
graphql-ruby copied to clipboard
Fields "missing" randomly
Describe the bug
I have a type which has the following field:
field :pit_logs, [Types::PitLogType], null: false, scope: false do
argument :dates, [String], required: false, default_value: '[]'
end
The query works fine (most of the times):
{
initiative(id: "902fba2e-08b9-4848-a994-d1dc02ab2c00") {
id
pitLogs(dates: ["2024-01-11", "2023-01-01"]) {
date
progress
}
}
}
And it returns the correct reply:
{
"data": {
"initiative": {
"id": "902fba2e-08b9-4848-a994-d1dc02ab2c00",
"pitLogs": [
{
"date": "2023-01-01",
"progress": 100
},
{
"date": "2024-01-11",
"progress": 100
}
]
}
}
}
However after some time I start to see the following errors returned from the GraphQL api:
{
"errors": [
{
"message": "Field 'pitLogs' doesn't exist on type 'Initiative'",
"locations": [
{
"line": 4,
"column": 5
}
],
"path": [
"query",
"initiative",
"pitLogs"
],
"extensions": {
"code": "undefinedField",
"typeName": "Initiative",
"fieldName": "pitLogs"
}
}
]
}
I have not been able to reproduce the bug in development. It only appears in production as far as I can tell.
Versions
graphql (2.2.8)
graphql-batch (0.5.4)
graphql (>= 1.12.18, < 3)
promise.rb (~> 0.7.2)
graphql-c_parser (1.0.7)
graphql
graphql-persisted_queries (1.7.0)
graphql (>= 1.12)
Ruby on Rails 7.1.3, Ruby version 3.2.3
GraphQL schema
Include relevant types and fields (in Ruby is best, in GraphQL IDL is ok). Any custom extensions, etc?
# frozen_string_literal: true
module Types
class InitiativeType < Types::BaseObject
pundit_role nil
description 'Initiative'
field :id, ID, null: false
field :pit_logs, [Types::PitLogType], null: false, scope: false do
argument :dates, [String], required: false, default_value: '[]'
end
end
end
GraphQL query
Example GraphQL query and response (if query execution is involved)
{
initiative(id: "902fba2e-08b9-4848-a994-d1dc02ab2c00") {
id
pitLogs(dates: ["2024-01-11", "2023-01-01"]) {
date
progress
}
}
}
{
"data": {
"initiative": {
"id": "902fba2e-08b9-4848-a994-d1dc02ab2c00",
"pitLogs": [
{
"date": "2023-01-01",
"progress": 100
},
{
"date": "2024-01-11",
"progress": 100
}
]
}
}
}
and sometimes:
{
"errors": [
{
"message": "Field 'pitLogs' doesn't exist on type 'Initiative'",
"locations": [
{
"line": 4,
"column": 5
}
],
"path": [
"query",
"initiative",
"pitLogs"
],
"extensions": {
"code": "undefinedField",
"typeName": "Initiative",
"fieldName": "pitLogs"
}
}
]
}
Steps to reproduce
I can't reproduce it in development. It seems to be a problem that appears after some time so perhaps a caching issue?
Steps to reproduce the behavior
Expected behavior
I expect the correct data to be returned
Actual behavior
The correct data is returned sometimes - other times just an error message that the field cannot be found.
Wow! That's so strange. Thanks for the detailed writeup.
My first idea is, maybe some types are being hidden from the client, so the field is being hidden. Do you have any visible? methods implemented in your schema? Or any other features that might be using GraphQL-Ruby's "visibility" feature (https://graphql-ruby.org/authorization/visibility.html)?
Another idea is, sometimes this happens when there's an error loading a source (.rb) file during the schema boot. But that would cause it to fail every time, not just sometimes. (That said -- do you run multiple server processes in production? Do you know if this error happens on one server but not other?)
I see that graphql-persisted_queries is in the gemfile, but as far as I know, it wouldn't affect this, because that gem caches query strings, but not results.
Let me know about visibility. Other than that, are there any other plugins or configurations in your schema, or in PitLog that you can share?
Hi
I do not have any visible? methods implemented in any of my schemas. Actually we opt out of that feature in our schema:
use GraphQL::PersistedQueries, compiled_queries: true, store: :memory
use GraphQL::Batch
# Opt out of GraphQL-Ruby's visibility feature:
use GraphQL::Schema::AlwaysVisible
mutation(Types::MutationType)
query(Types::QueryType)
The application server we use is puma and it is started with the following parameters:
bundle exec puma -e production -t 1:5 -w 5 --pidfile tmp/puma.pid -S tmp/status.txt --preload
The pitLogs is as follows:
From the initiative model:
def pit_logs(dates: [])
# Return PIT data for the given dates
GetPitData.call(object: self, dates:).data
end
It simply returns objects as follows:
# frozen_string_literal: true
class PitLog
attr_accessor :date, :progress, :weight, :period_progress_in_percent, :deviation_min, :deviation_max, :deviation_status, :finished_at, :is_finished, :is_finished_overdue
end
The pitLog type is as follows:
# frozen_string_literal: true
module Types
class PitLogType < Types::BaseObject
pundit_role nil
description 'PIT data'
field :date, String, null: false
field :progress, Integer, null: false
field :weight, Float, null: false
field :finished_at, GraphQL::Types::ISO8601DateTime, null: true
field :period_progress_in_percent, Integer, null: false
field :deviation_min, Integer, null: false
field :deviation_max, Integer, null: false
field :deviation_status, String, null: false
field :is_finished, Boolean, null: false
field :is_finished_overdue, Boolean, null: false
end
end
I have investigated further. We are running several docker images with the same application deployed. It is random if a node returns the error or the correct data; so the issue is not due to one node returning an error and the other nodes returning correct data. It is completely random.
After some time all the nodes will fail. So it might be some caching / parsing / stale object problem?
The returned "documentation" from the server also changes; some times the pitLogsentry is in the docs; other times not.
Ok, thanks so much for sharing all these details. AlwaysVisible is certainly the most dubious of all the plugins -- I'm going to dig in there and see if I can spot any bugs that might make this fail sometimes...
Feel free to contact me directly so you can have a deeper look into the source code. [email protected]
One more clarifying question: when a node returns an error response, does it keep returning error responses after that? Is it "stuck" in that state, or does it go back to returning (mostly) correct data responses?
Also, was there any change to the system that made this start happening? (Eg gem upgrade, schema change?) Or did you just notice that it's been happening, but not sure when it started?
I've been investigating AlwaysVisible, but I just can't come up with anything that would create this irregular behavior -- no caching (that's the point). And besides, what would cause this to be hidden 🤔 I can only think of two reasons that would be true:
visible?has returned false -- but you said that there aren't any visibility implementations in your app. And besides,AlwaysVisiblewould skip those calls even if the method was defined.- The field really isn't defined. But obviously it is...
- Ok, a third one: is the GraphQL Schema's internal list of fields getting messed up? Each type has a hash of fields (source), and if somehow that was modified during execution, maybe it would cause behavior like this. But how could it be ... I don't know of any API for removing entries from that Hash.
I'm at a loss for clues ... could you share the code of your BaseField and BaseObject classes? Maybe there's some clue in there. Feel free to email me ([email protected]) if you prefer.
Hi
# frozen_string_literal: true
module Types
class BaseObject < GraphQL::Schema::Object
include GraphQL::Pro::PunditIntegration::ObjectIntegration
pundit_role nil
field_class Types::BaseField
end
end
# frozen_string_literal: true
module Types
class BaseField < GraphQL::Schema::Field
# Add the Pundit integration:
include GraphQL::Pro::PunditIntegration::FieldIntegration
# By default, don't require a role at field-level:
pundit_role nil
argument_class Types::BaseArgument
end
end
Please note that we are using GraphQL Pro.
Thanks for sharing those. Unfortunately, I don't see any clues there at all 😖
As a matter of simplifying and ruling out some possibilities, could you deploy without AlwaysVisible and/or GraphQL::PersistedQueries? I think they're both performance improvements, but IIUC, the API should work with one or the other, or both, disabled for a short time. Would that be possible, even in a staging environment, if this error happens there?
Is it just this field that has this bug, or does it happen to other fields, too? If it's just this field, then I guess we should look for unique configurations that might affect just pitLogs. (Nothing in the code above gives me any clue ... maybe there's some global handling for things named "log"? Maybe some strange error somehow coming from GetPitData? But I don't even know how something from that runtime code could make a field disappear....) But, if it's happening to other fields too, then I guess we should look for something higher-level, that would affect all the fields.
This is a tough one 😖
Update: I have run tons of tests via RSpec shooting off 100K requests to the endpoint without any issues. I have tested the application with production settings also and I couldn't replicate the issue. It makes me think that it might be an issue with the application server (Puma). So I have changed to Passenger for now and will see how that works compared to Puma. I literally can't figure out how the app and the GraphQL gem should suddenly change - so my main suspect is Puma and perhaps how threads are handled when spawned from the workers. I will let you know my findings.
Sadly the bug persists:
{
"errors": [
{
"message": "Field 'pitLogs' doesn't exist on type 'Initiative'",
"locations": [
{
"line": 4,
"column": 5
}
],
"path": [
"query",
"initiative",
"pitLogs"
],
"extensions": {
"code": "undefinedField",
"typeName": "Initiative",
"fieldName": "pitLogs"
}
}
]
}
Ok I found the error!
When an object is updated we send out requests to connected clients to fetch queries from the backend. In order to do that we construct the GraphQL query to be performed by the webapp (Angular) to the backend.
For a given object I have an interactor that will return the GraphQL query. However I filter out some fields that should not be included in the query.
How to replicate the bug.
Let us check to see if the Types::InitiativeType has a pitLogs key (it should have):
irb(main):036> "Types::InitiativeType".constantize.own_fields.keys.include?("pitLogs")
=> true
Now lets do the same except filtering out the pitLogs field:
irb(main):037> "Types::InitiativeType".constantize.own_fields.except!("pitLogs").keys.include?("pitLogs")
=> false
So far so good. But now when I check to see if pitLogs is a key I get this:
irb(main):038> "Types::InitiativeType".constantize.own_fields.keys.include?("pitLogs")
=> false
It should return true in this case!
Reloading the app solves the issue:
irb(main):039> reload!
Reloading...
=> true
irb(main):040> "Types::InitiativeType".constantize.own_fields.keys.include?("pitLogs")
=> true
Does the .constantize in any way alter the definition of Types::InitiativeType - it seems like it is the case?
.except! is typically used with hashes when you want to modify the original hash directly. Be cautious with .except! since it changes the hash it's called on, which might affect other parts of your code relying on the original state of the hash.
So I guess that it would be correct to use .except instead of .except! since the latter would change the keys of the Types::InitiativeType and thereby cause this error.
It would be ok to close this ticket now since it is not an error in the gem.
use
.exceptinstead of.except!
Yes, I'd agree that's the best way to go forward. It's nice to make sense of what was going on! Thanks for sharing all your findings.
As of .constantize, I wouldn't expect it to modify the definition, but reload would certainly do the trick. reload would tell Rails to forget all the constants it loaded and rebuild them from source. So that would make the definition's own_fields hash go back to its initial state.
For the record, one possibility here would be for GraphQL-Ruby to call freeze on all its definition state objects (like this Hash), but the challenge is that we can't really be sure when definitions are "finished". Ruby being Ruby, there are always ways to continue modifying classes. It might be possible to add .freeze/.dup to public APIs for modifying definitions, so that only those intended methods could be used, but I don't think that'd be work the overhead right now. (It would add overhead to definiting the schema during application boot, slowing down development and deployment.)