persistgraphql icon indicating copy to clipboard operation
persistgraphql copied to clipboard

Retain previously persisted queries for a number of subsequent builds[feature request]

Open lassombra opened this issue 7 years ago • 9 comments

One of the things keeping us from using this in our implementation is that existing published client builds would not be supported if a query goes away, or something reorders them. One thought as to how to handle this:

  • scan output file for already defined queries (production build would update this and store it in git).
  • queries in output file not in input scan are marked in some way (separate deprecation file with a count? Alteration to json structure to include a deprecated object which contains the old id?)
  • After fixed number of times going to the deprecation file, the query is removed from the json file, being replaced with a noop query (to block against id reuse)
  • new queries are then added to the end.

Big questions I see are how to handle deprecation data. A separate counts file is one option which could be committed along side the persisted json but stripped automatically from production builds by any tool which does any kind of dead code removal (as it's build tool metadata, not used code). Any query which is not in the current code gets checked/added to that counts file and once it is past x number of builds (which is a command line option) it gets removed.

Another option I see is to add deprecation data in place of the raw id number in the json. This won't affect a client consuming it, though in development perhaps the connection could understand that and emit a warning if a deprecated query is hit. Server side is where this changes the most, as the reversal isn't alone enough as the deprecation information put into the json would alter the key map.

Benefits to the two ways I see of storing deprecation information: Separate file:

  • No changes to APIs
  • Server handling remains really simple as a middleware with no code needed from this package

Same JSON:

  • Easy to implement various runtime checks in development for warnings
  • Server can monitor and log use of a query, allowing dev team to put it back in after build if it's still regularly used
  • Deprecation cutoff could theoretically be handled server side in code rather than in compile configuration.

lassombra avatar Mar 09 '17 04:03 lassombra

Hi @lassombra - thanks for opening the issue. Although I agree that this is definitely a valid concern for query versioning, I don't think this package should be responsible for handling the semantics of query deprecation, versioning, storage, etc. since these likely vary widely across different use cases and production environments.

Instead, I think we should implement a simple addition to the tool that allows it to merge a new set of queries into an old set while preserving the query <-> id mapping present in the old mapping. From this point, it shouldn't be difficult to implement the query deprecation semantics you describe in your post. What do you think? This will allow you to maintain the queries for old clients.

Poincare avatar Mar 09 '17 07:03 Poincare

@Poincare I can see how that information could be different. My reasons for wanting it in this tool are many, and I'll lay the ones I think are pretty universal out. Obviously this is a request not an expectation.

  • Current behavior means that no legacy queries make it into final build. This can have various problems for older clients, but from a security standpoint it's great, if a query is discovered to be getting more information than desired, or maybe just is too big and has ddos potential, it can be removed and it's gone. With merging of query sets, additional work is needed to hunt that query down in the json and remove it.
  • With "deprecation" metadata emitted by this tool, there is one tool which has to scan the files. If it's emitted as a second file, then other tools which would then implement deprecation would simply go over that file and the emitted file from this tool, perhaps even at server startup.
  • If this tool doesn't handle deprecation, then another build step is required, which can be time consuming. If this tool doesn't emit deprecation data at all, then that tool has to scan the application for graphql files / data doubling much of the work of this tool.

I think the best compromise would be to export a "counts" file which has query ids and number of times the tool has run since that query was found last. Then a server or later build step can use that information to decide whether to throw a deprecation error, throw a deprecation warning (which an implementation detail outside of this project can handle) or simply allow the query through.

lassombra avatar Mar 09 '17 13:03 lassombra

@lassombra Would it be possible to build this query counting logic outside of this package rather than within it? I imagine taking the output of this tool and then running a simple build step to figure out how to merge with your existing queries (and keeping counts as needed) would be pretty simple.

Secondly, I don't think it is a good idea for the count to represent the number of times a particular query has been added in since that relies on the fact that you don't mistakenly run persistgraphql more than once per change in your query set. It feels at least somewhat brittle to me.

Poincare avatar Mar 13 '17 03:03 Poincare

It certainly could. And I would somewhat agree to the somewhat brittle statement. As I said above, my preference would be to have only one tool scan the application. So if preservation logic moves into this tool (which I really think it should) then some kind of removal reporting logic should as well.

That said, perhaps this should be split into two issues? One for the preservation of existing queries (which I think is reasonably accepted as a good idea) and one for hashing out whether some kind of counting / reporting logic gets done?

lassombra avatar Mar 13 '17 08:03 lassombra

@Poincare do you have any insight on how FB is solving this ? It should be a pretty valuable experience to build upon.

tlvenn avatar Apr 18 '17 03:04 tlvenn

@leebyron can you shed some light on this topic ? Thanks a lot in advance.

tlvenn avatar Apr 18 '17 03:04 tlvenn

Perhaps using a hash of the query itself would be beneficial. That way the lookup would only change if the contents of the query itself changed. It would be nice if the hash was able to ignore whitespace changes but that would be bonus points.

mliszewski avatar Jun 14 '17 15:06 mliszewski

Yes - I agree that using a hash instead of a sequential ID would be much better!

@tlvenn at Facebook they basically store all queries ever persisted into a database, and load them on various GraphQL servers when needed with caching. That's what we expect people to do with this tool as well, but we haven't had time to write a lot of content about how to set up the whole workflow. Would definitely be looking for some help with that.

stubailo avatar Jun 14 '17 16:06 stubailo

Although the hash function idea is interesting, hash function support on the browser is pretty bad and it probably isn't a good idea to implement our own hash function within this project. I guess a possible solution is to add some kind of dependency that offers a simple hash function within the browser.

Poincare avatar Aug 19 '17 20:08 Poincare