rescript-apollo-client icon indicating copy to clipboard operation
rescript-apollo-client copied to clipboard

Removing catch around parse function, and result types

Open jfrolich opened this issue 3 years ago • 22 comments

There is some defensive programming in the client where there is an expectation that the parse function can fail. GraphQL is a protocol with certain guarantees. That means we can assume data shapes and that in turn means that the parse function would never raise an exception unless there is a bug in graphql-ppx.

Can we remove these checks?

jfrolich avatar May 25 '21 05:05 jfrolich

Or are there certain scenarios where you found that parse raises an exception?

jfrolich avatar May 25 '21 05:05 jfrolich

Attempting to parse partial data was one case: https://github.com/jeddeloh/rescript-apollo-client/issues/46

I think there are other reasons to do this, though. Are there tradeoffs besides the cognitive overhead of having to deal with results in some parts of the code?

jeddeloh avatar May 25 '21 13:05 jeddeloh

Attempting to parse partial data was one case: #46

I think there are other reasons to do this, though. Are there tradeoffs besides the cognitive overhead of having to deal with results in some parts of the code?

would partial data be typed as nullable anyway? Or perhaps we should have a annotation that data might be partial that gets picked up by the ppx (so all types are nullable)? How does partial data work in Apollo? Is it on a fragment basis?

jfrolich avatar May 26 '21 01:05 jfrolich

It looks like when returning partial data everything is fair game for apollo. So basically to be able to support this, all fields need to be optionals. So a top level query directive like @ppxPartialData that makes all fields optional might make sense? We can also add a type definition (opaque or a poly variant) to the module, so we can make it impossible to use a hook that returns partial data on a query that doesn't have this annotation.

jfrolich avatar May 26 '21 01:05 jfrolich

That does seem like a better solution to a partial data scenario.

Here are a few other examples of when parse might raise an exception:

  • Reading from the cache
  • Schema mismatch (outdated client or some sort of infrastructure issues)
  • Poor graphql implementation (I was interfacing once with a Rails server that was giving me html instead of json in some error states)
  • Varying schema (I know Hasura actually changes the schema depending on role and permissions. Should you be compiling with different schemas? Sure. But maybe you mess it up, or you don't have time at the moment, IDK)
  • In the middle of developing when you haven't updated the schema

In general my thought is that I never trust anything coming into the system even if it's supposed to work. I personally use Apollo Client in situations where it's absolutely critical that we perform some other action if something were to fail, so this just in general felt like the right tradeoff to me.

Could we discuss more what the pain is you're feeling around this?

jeddeloh avatar May 28 '21 19:05 jeddeloh

  • Reading from the cache

You'll read a fragment/query from the cache, right? These should have the same guarantees.

  • Schema mismatch (outdated client or some sort of infrastructure issues)

Yes, but the GraphQL protocol should never break in these scenarios as it is backwards compatible. Usually you'll mark fields that are going to disappear as deprecated first. I work on a native app, with old versions around, so this is always a concern. BTW if you do things like drop fields or something like that, your app is going to break in some way anyway. Crashes are better IMO, because it would trigger alerts (and the server can be fixed quickly).

  • Poor GraphQL implementation (I was interfacing once with a Rails server that was giving me html instead of json in some error states)

In these scenarios a crashing app is not that bad. It's more easily diagnosed, and it's exceptional behavior.

  • Varying schema (I know Hasura actually changes the schema depending on role and permissions. Should you be compiling with different schemas? Sure. But maybe you mess it up, or you don't have time at the moment, IDK)

Hmm that is strange. Authorization should be separate from the schema. Also changing schemas at runtime is not really compliant with the spec I believe. Do you have an example?

  • In the middle of developing when you haven't updated the schema

But is an exception not better than a silent error?

In general my thought is that I never trust anything coming into the system even if it's supposed to work. I personally use Apollo Client in situations where it's absolutely critical that we perform some other action if something were to fail, so this just in general felt like the right tradeoff to me.

GraphQL is a protocol, so it does have guarantees about some things. If you are not going to trust it, why would you type it? In certain exceptional situations you can do you own exception handling? (same as in JS?)

jfrolich avatar May 29 '21 01:05 jfrolich

Perhaps we can have vanilla hooks that do not do parsing on the data and work with the Raw.t? So you do the optional parsing yourself (actually most of these hooks can be zero-cost). The hooks that work with non-raw can be using these other hooks under the hood. Would also be interesting to have for performance in some scenarios.

However working with the Raw.t can also result in an exception if you don't trust the data. If you are uncertain you have to convert it to Js.Json.t and use the safe methods.

jfrolich avatar May 29 '21 01:05 jfrolich

Haha, I think we're just rehashing the exact same discussion we had around promises. I think a function that returns a result should always return a result and you think exceptional circumstances should be an exception. I obviously find the experience of using results to be superior in all the scenarios where you describe exceptions superior. I'm not quite sure why we're not able to easily align on this issue. I do, however, trust you greatly (more than myself) as a programmer. I don't know that there is a "right" answer either. Both have tradeoffs, IMO. If you have some passion to make this change, let's do it!

GraphQL is a protocol, so it does have guarantees about some things. If you are not going to trust it, why would you type it? In certain exceptional situations you can do you own exception handling? (same as in JS?)

Not going to break down each of the other points, but I felt maybe it's worth clarifying my definition of "trust" here. I wouldn't say I trust it unless I have no exception handling (absolute 100% trust). And if I need exception handling, I think those exceptions should be accounted for in the result. This lack of "trust" doesn't invalidate the usefulness of the types in my eyes.

Perhaps we can have vanilla hooks that do not do parsing on the data and work with the Raw.t?

I think I would lean toward choosing one path and keeping things simple.

jeddeloh avatar May 30 '21 21:05 jeddeloh

I do, however, trust you greatly (more than myself) as a programmer.

Well you shouldn't do that haha

I think a function that returns a result should always return a result and you think exceptional circumstances should be an exception

Yes. The problem with trying to deal with exceptional circumstances, you'll end up with overly defensive programming and complexity starts to increase exponentially. There is a cost to dealing with results. Code becomes less readable. So I think in general it should only be done when there is a failure that is expected. I recently used a codebase that used results everywhere, it's not pretty. And refactoring it moving some exceptional cases to exceptions made the code way more readable (also easier to debug btw, because you know where the crash happened).

I think those exceptions should be accounted for in the result.

I think we should compare this to JS or TS. You don't extract the code that accesses the data that comes back from GraphQL, and wrap it in a try clause in the event that some fields are not available (if you know that they match the GraphQL types).

Thinking more deeply. What are the specific things we are catching?

  • Field is not available and is not nullable in GraphQL

I think all other cases would not raise an exception in the parse stage (if the type stages it wouldn't raise an exception, it would just mistype the result).

This could also be fixed with the thing we were talking about earlier? (Making all fields nullable in case of delivering partial results).

jfrolich avatar May 31 '21 01:05 jfrolich

Well you shouldn't do that haha

Regardless, you put a ton of energy into the community, graphql-ppx, and this library. If we can change the ergonomics of the library for the better for you, that’s a pretty big force in favor. It would be different if anyone else contributed 😄

Yes. The problem with trying to deal with exceptional circumstances, you'll end up with overly defensive programming and complexity starts to increase exponentially. There is a cost to dealing with results. Code becomes less readable. So I think in general it should only be done when there is a failure that is expected. I recently used a codebase that used results everywhere, it's not pretty. And refactoring it moving some exceptional cases to exceptions made the code way more readable (also easier to debug btw, because you know where the crash happened).

I've also been in codebases that used results heavily and found the experience to be really great. I wonder why our experiences are so different. I’m going to continue with the discussion because I’m curious. But before I do, I’ll state that while I would personally stick with what we have since it feels correct to me from a philosophical perspective, I know it has tradeoffs and isn’t even really addressing a problem that is likely to occur. I’m perfectly fine going in this direction if you feel strongly we should. But it’s still not clear to me what problem we’re trying to solve here, so even if it’s the better direction, I cannot say if it’s worth unwinding the try/catch 😅

Since we’re talking about results in general, I’ll try to give my perspective on their usage in general before moving on to whether this should throw an exception. I think all of us that have dealt with results realize it is a rabbit hole where you quickly reach the conclusion that every single function could possibly throw an exception, so all functions become results returning. We can probably all agree that is a path to madness? Instead we adopt more of a “functional core, imperative shell” style of programming where everything is checked at the edges of the program and results are constrained only to things that interface with the outside world. When limited in scope to the edges of the program, I really don’t find results to be adding much complexity, and the argument of added complexity isn’t swaying me very much. In particular, we’re already using a result and will continue to do so in this case? Could you tell me a bit more about how the argument against results in general applies here if we’re still going to use a result? I think I would understand if the argument were to go back to a straight ApolloResult object and throw exceptions in all other cases, but I don’t think you’re arguing for that. So you must be honing in on added complexity within the result handling rather than arguing against use of results in general? Okay, interesting. Bear with me as I think out loud why throwing exceptions in this case would be better.

In a typical scenario, let’s pretend our possible responses can be represented like so: NetworkError | PermissionDenied | GoodResponseButErrorsInPayload | InternalServerError | UnexpectedException. If I’m to give a good user experience, I’m going to handle each of these cases differently, even if it’s just to respond with a slightly different failure message. How are you handling this in your apps? The addition of UnexpectedException seems trivial if you’re handling the other cases? Why is there value to throwing that one case? What if PermissionDenied is extremely unexpected for some call? I should throw that, too? I feel like we’ve only added complexity going from a single concept that was nicely type-checked and completely transparent to two concepts that are less so.

I think we should compare this to JS or TS. You don't extract the code that accesses the data that comes back from GraphQL, and wrap it in a try clause in the event that some fields are not available (if you know that they match the GraphQL types).

Sure, it’s JS. But I don’t want a JS? I want something more along the lines of an elm-lang that has escape hatches and great react support. Doing all of this checking at the edges has costs, but I’ve always felt them worth paying. Maybe it’s just me, but in a functional language, results are sacred. Returning a result implicitly states to me, “I’m not going to throw an exception”. This mixing and matching is extremely unexpected and breaks some fundamental invariant I rely on.

Thinking more deeply. What are the specific things we are catching?

  • Field is not available and is not nullable in GraphQL

I think all other cases would not raise an exception in the parse stage (if the type stages it wouldn't raise an exception, it would just mistype the result).

Now that you mention it, it’s clear I was operating on assumptions from what graphql-ppx used to do. Didn't it used to do a full decoder for all the data coming in? Now it sounds like it's just trusting and casting GraphQL types? Sorry, it's been a busy year and a while since I've even looked at the ppx output. If that's the case, I've probably been making many assumptions about how much checking is going on at the edges. You’ve already reasoned that it’s extremely unlikely, but my preference would still have been to check everything. Like, look at how you reasoned these things shouldn’t happen, but we have the poor guy struggling with partial data. Yeah, we can find a solution to that specific problem but it feels like there’s always something else and I don’t trust myself to be able to always outthink it. The ideal solution to me would have been that graphql-ppx still did the decoder by default but had the option to enable “performance mode” or something that just cast to the types and saved code size and performance. When enabling it you know to be careful when using it in critical code paths.

I do think the community seems much more aligned with your way of thinking, though. I used reason-promise for a while because it was so widely recommended, but it ended up almost biting us soo hard. It completely bails on any exception in order to be "typesafe". Your only mechanism for handling such error is a global uncaught error handler or manually wrap every single step in a promise chain with try/catch and manually convert exceptions. Things were further compounded by some other bugs in the library, but even without, I just can't believe that is acceptable exception handling to anyone. And yes, they did come up due to something incredibly stupid, and yes it would have been disastrous for the business if we didn’t know about that issue with reason-promise and used our own promises that behaved as we expected. That's not to disparage the library, I think there is tons to like about it! And not that I'm doing any better here! But clearly I'm missing something about exception handling. 😂 I think your suggestion overall also feels like keeping things as light as possible which seems to be more in line with the rescript philosophy.

Thoughts?

jeddeloh avatar Jun 05 '21 16:06 jeddeloh

Actually I agree with most of what you are saying. I am only of the opinion that GraphQL is a typesafe protocol, and once you query the schema there is a contract between the client and the server that is the GraphQL schema. Query.parse will only potentially raise an exception if that schema has breaking changes. So for instance a Article has a comments field that is NON_NULL, so we can safely get comments.length. Now when the schema changed to make it nullable, our code will crash. In my opinion a crash here is fine because we intentionally made a breaking change in the schema.

If we accidentally made this change (we were not aware that this field was already used in the app when we made the breaking change), I would say a hard crash is good. Because this problem will be caught in QA. If we fail silently and not crash it's harder to find and debug the error.

Actually NON_NULL being made nullable is the only breaking change I can imagine (because adding fields, unions, interfaces or enums is not breaking), and removing a query/mutation will fail in the networking stack). And it will only fail if NON_NULL objects are made nullable (for scalars it will fail not in the parse function because it will just coerce without checking nulls, so it will fail somewhere in the user's code, so you have a crash anyway even with this exception handling).

If we want to support partial results, this should be properly typed in graphql-ppx (probably with an annotation) because parse won't work correctly with partial data even if we catch the object access, and for scalars the code can crash anyway because ReScript will not know a scalar can be potentially null.

I also have some issues with exceptions being swallowed in my application potentially somewhere in rescript-apollo-client (but I didn't find the cause yet) but that's probably unrelated.

jfrolich avatar Jun 07 '21 01:06 jfrolich

Didn't it used to do a full decoder for all the data coming in?

We don't do full checking of the JSON because the number of checks we need to do used to generate an exponential number of permutations in the generated code, which blew up code size and hurt performance quite a bit for larger/deeper queries. See this issue.

jfrolich avatar Jun 07 '21 01:06 jfrolich

I'm aligned on the GraphQL part at this point, thanks for bearing with me! But if we return to the original issue I referenced about partial data, it was never the desire to get partial data, it was an unintentional situation because there were errors. In that issue (or in a linked graphql-ppx issue) you were suggesting that it's rescript-apollo-client's responsibility to avoid parsing when there are errors. My solution here I thought was the most elegant of all the options: just always try to parse the data without crashing the app because it could mask errors.

So, we don't get to just drop the try/catch, right? Are you basically proposing we swap the try/catch around parse for an if/else checking for errors? What exactly do we gain by doing that? You seem to be worried mostly about exceptions being swallowed and silent failures? Is this the main issue we're trying to solve for at this point?

jeddeloh avatar Jun 10 '21 14:06 jeddeloh

Um I think I didn't realize the impact. If we are parsing partial data (whether it's due to an errorPolicy or with the partialResults option), we should have the types and the parse function that graphql-ppx generates adapted for that (basically treat all fields as nullable). So this is something that has to be fixed in graphql-ppx. We can probably also find a way to make it fully typesafe.

jfrolich avatar Jun 10 '21 14:06 jfrolich

Perhaps we can have a parsePartial function that is an alias of parse when @ppxPartial is used as a directive to instruct the ppx that all data can be partial.

jfrolich avatar Jun 10 '21 14:06 jfrolich

so if you don't have the annotation, it will fail to compile because parsePartial is not available.

jfrolich avatar Jun 10 '21 14:06 jfrolich

BTW the partialData story is interesting. I saw that Apollo is going to have a more precise partial rendering technique in 4.0 that is more similar to Relay with useFragment and suspense.

jfrolich avatar Jun 10 '21 14:06 jfrolich

I don't have a ton of clarity at this point how we make things typesafe for errorPolicy when set globally, or whether we can simply get around the issue in another way, or that that issue even needs to be addressed, but I'm on board with the plan of removing the catch in general.

I also have some issues with exceptions being swallowed in my application potentially somewhere in rescript-apollo-client (but I didn't find the cause yet) but that's probably unrelated.

I have an irrational level of fear of this. Well, that and the dreaded uncaught null or undefined with no stack. Ironically, it's one of the reasons we ended up with results here in the first place.

BTW the partialData story is interesting. I saw that Apollo is going to have a more precise partial rendering technique in 4.0 that is more similar to Relay with useFragment and suspense.

Interesting! I have not been following Apollo all that closely of late.

jeddeloh avatar Jun 14 '21 13:06 jeddeloh

I don't have a ton of clarity at this point how we make things typesafe for errorPolicy when set globally, or whether we can simply get around the issue in another way, or that that issue even needs to be addressed, but I'm on board with the plan of removing the catch in general.

Yes. But errorPolicy is not typesafe right now as well (most type errors will happen outside of parse). So the only way to make this typesafe is again to make everything in the query nullable. Which is the same thing you need to with partial data.

BTW to be able to do this, there needs to be a change to the ppx, so it's not a quick fix, but a thing we can implement over time.

I have an irrational level of fear of this. Well, that and the dreaded uncaught null or undefined with no stack. Ironically, it's one of the reasons we ended up with results here in the first place.

👍

jfrolich avatar Jun 14 '21 14:06 jfrolich

Yes. But errorPolicy is not typesafe right now as well (most type errors will happen outside of parse). So the only way to make this typesafe is again to make everything in the query nullable. Which is the same thing you need to with partial data.

Yeah, that's clear to me. I think what I'm really getting at with my comment about a lack of clarity, is that I would wait to remove the catch until we have a strategy in place to either partially parse or not parse when appropriate. It just seems unfortunate in instances like that initial issue I referenced that had the partial data, we have someone trying to track down why their app is crashing and not understanding the nuances. As it stands, we would preserve the authentication error, and even give you the data at the same time if we can parse it, as opposed to obscuring the real cause behind a type error and crashing the app.

Are you proposing we remove it now even without the ppx additions?

jeddeloh avatar Jun 14 '21 16:06 jeddeloh

No we have to wait for the ppx addition.

But just saying that with the current approach it crashes mostly outside of the parse function, because most of the time the parse function just does a dumb type coercion based on the known schema. So the crash would be happening when trying to access a field that does not exists.

jfrolich avatar Jun 15 '21 01:06 jfrolich

👍

jeddeloh avatar Jun 15 '21 14:06 jeddeloh