graphql-schema-registry Add schema definition breakdown feature

Hello @tot-ra :wave:

We are planning to add some features mentioned in the roadmap, starting by adding a schema usage breakdown (Query, Mutations, Scalars, Objects...). The aim is to store all the schema definitions to be able to display them and also allow next steps such as usage tracking.

Backend changes

We would like to create some new MySQL tables with the distribution showed below: Having this, we could parse the type_defs received in the schema/push endpoint to the new tables. Furthermore we will add new graphQL queries for the frontend to consume this data.

Frontend changes

Consume the backend new graphQL queries to present all the schema definitions adding new pages.

Thank you for your time, and nice work! :smiley:

May 04 '22 14:05 marcpascualsanchez

Hey. Thx for the interesting topic, we do have schema usage in PD ourselves too (though its not so nested). Some questions:

How is UI/API going to look like?
Lets say I make a query { user { name } }. Do you plan to insert new record for every query into operation, type & field tables? Otherwise I don't see exactly how you're getting schema usage breakdown (by property). If you do that, then this is not going to scale very well, because we can get a lot of queries & a lot of properties.
why do you need fields like is_nullable and is_array [for usage]?

May 04 '22 15:05 tot-ra

Hello!

How is UI/API going to look like?

It is going to be similar to other solutions on the market, I can share tomorrow some prototyping.

Do you plan to insert new record for every query into operation, type & field tables?

That is the plan, the idea is to have control about what are the fields inside each operation , this should not be an issue, on the other hand we plan to store the usage of those fields on the operations but it is not going to be stored forever, the idea is to have records from the last 30 days otherwise can be a decrease in the performance

May 04 '22 15:05 SirJalias

Hello 👋

why do you need fields like is_nullable and is_array [for usage]?

We decided to add this columns in the fields table, because we are planning to be able to represent data like the following -> [String ! ] !. So for this example, it will be is_array=true, is_nullable=false and is_array_nullable=false

{ user { name } }

For this example, we will need to store the query on the operation table, the name on the fieldtable and assuming name is type String, we also need to add the String in the type column as Scalar. With all of that data stored, we can know the usage for the Query, and also for the attribute name. Because we will be able to register the usages on the requested_fieldsand requested_operations

May 04 '22 15:05 oscarSeGa

what are the fields inside each operation we also need to add the String in the...

Thats not going to scale. Here at pipedrive, we serve >8k requests per minute. Thats 8k amount of INSERTS just if you assume its one field requested. At that rate, your mysql table would have 345M rows by the end of the month..

May 04 '22 15:05 tot-ra

I would suggest to consider this kind of architecture:

Gateway needs to send requested query to some queue (pubsub redis or better kafka)
Then some piece of code, preferably written in golang that can efficiently utilize all CPUs, would fetch the query, parse it to AST tree, use graphql's visitor, go through all graph nodes, increase property count (usage) & store it in memory
in graphql visitor you need to map queried field onto current live schema, because { user{name} } has no knowledge of User type
Then once in ~1 minute, it would take data from memory and flush it to mysql (with bulk insert)
Basic & most valuable information is hits per day per property (User.name: 1 )
Periodically, you need to cleanup old usage info. I'd suggest 5 day usage retention. But for smaller projects, I guess it makes sense to have 30 days.
The more granular & more connected data you need, the bigger disk space you need. So these values should be configureable. But Ideally you shouldn't have more than 1M rows in a table.
As I mentioned golang, It doesn't matter that much, as long as this processing can be moved to a separate dockerized process. DB can remain the same.

May 04 '22 16:05 tot-ra

Hello :)

We discussed internally about your suggestion and right now we are going to focus on the breakdown queries when receiving a schema/push endpoint. Meanwhile we are going to explore a new solution for the schema usage and share it on this thread again.

Thanks for the patience

May 05 '22 10:05 oscarSeGa

breakdown queries

do you mean when someone pushes the schema, you want to parse type_defs and save it into relational form? I guess that may help to build UI where you can focus on specific entity or property (like apollo studio does). The possible problem there is that it may be inconsistent with actual type_defs that are stored as text. So I assume text form will remain source of truth.

May 05 '22 11:05 tot-ra

Exactly as you said. We will store everything on the database tables, to be able to display the model similar to apollo studio does. And yes, the text form will be the source of truth.

May 05 '22 11:05 oscarSeGa

Hello @tot-ra , as mentioned before, we are going to start working on the break down feature, before planning the schema usage feature. We would like to know your opinion when a schema is being updated (new type, modifying a field, removing a query...), if we encounter a breaking change, as we don't know if the change is being used by anyone, we are planning to add a header on the /schema/push http POST as a "force" mechanism to allow the schema update. By default, will be false, so if we encounter a breaking change, it won't be possible to update the schema.

As soon as the usage feature is working, we will change this behaviour, and will be only valid to update an schema in case the breaking change is not being used.

May 10 '22 16:05 oscarSeGa

closing this, lets continue in https://github.com/pipedrive/graphql-schema-registry/issues/146

Sep 12 '22 11:09 tot-ra

graphql-schema-registry graphql-schema-registry copied to clipboard

Add schema definition breakdown feature

Backend changes

Frontend changes

graphql-schema-registry
graphql-schema-registry copied to clipboard