graphql-upload icon indicating copy to clipboard operation
graphql-upload copied to clipboard

Change scalar to support serialization.

Open yaacovCR opened this issue 4 years ago • 10 comments

Helpful for schema proxying, see:

https://github.com/yaacovCR/graphql-tools-fork/blob/with-apollo-upload-client/src/scalars/GraphQLUpload.ts#L10

Maybe your package could provide two scalars, one for server?

yaacovCR avatar Mar 17 '20 12:03 yaacovCR

Are you able to explain in more detail what "schema proxying" is, and how it works, for this change to be necessary? It's a little hard to make an informed decision without trying to reverse engineer the tools you’re working on.

If there is no better way, I'm not against the idea of changing the GraphQLUpload scalar to serialize: value => value — as long as there are still helpful errors somehow for people misusing the Upload scalar; perhaps if they mistakenly use it as the type for fields in non input types.

jaydenseric avatar May 23 '20 07:05 jaydenseric

Link: https://www.graphql-tools.com/docs/schema-stitching

Schema proxying is inaccurate, in a nutshell it is one graphql server proxying a portion of a query to another.

It requires reserializing parsed input arguments.

yaacovCR avatar May 24 '20 01:05 yaacovCR

Hi @yaacovCR, upload objects contain streams of bytes that were originally sent via multipart HTTP request. I'm certainly no expert when it comes to Apollo's schema stitching pattern, but it's my understanding that it assumes that all scalar values can be represented in JSON. This assumption directly opposes the premise and design of this library which leverages the GraphQL multipart request spec to efficiently make large binary payloads accessible to GraphQL resolvers.

The Upload scalar is designed to be used as an input only (notice that it is not called File or Blob) because there was no desire to change the response type of a GraphQL server from application/json. There may be some real benefit to using a multipart/mixed content type in responses, and if such behavior were to be specified, then support for large binary scalars become more symmetrical. Of course, this doesn't really help your use case though.

I'm going to leave this open for now, but I anticipate that it will get closed as being "out of scope."

Does this conflicting ideology make sense to you? Is there a specific serialization format that you think would actually be appropriate here?

mike-marcacci avatar Nov 30 '20 04:11 mike-marcacci

Hi @mike-marcacci !

Great to get some attention on this, even if negative. Thanks for all your work on this library as well as fs-capacitor (and for your advocacy in favor of TS in upstream graphql-js!). While I'm at it, thanks as well to @jaydenseric for his work on this library!!!

In terms of this issue, I do hope you and @jaydenseric reconsider. The only reason I see for serialization being disabled is the suggestion that there is no use case, but there is indeed a use case....while serialization is usually used for preparing GraphQLOutputType values, but is also can be used to reverse the parsing of GraphQLInputType values for when a resolver needs to proxy them to a different service.

A few points:

(1) Apollo spun off schema stitching in favor of Federation some time ago. Currently, schema stitching as part of the graphql-tools library is managed by the Guild. I also help out when I can.

(2) Federation is a pattern of GraphQL subschema development -- that when adhered to -- is compatible with Federation-compatible gateway servers, with a canonical implementation by Apollo. Schema stitching is the creation of a new gateway schema that proxies requests to any GraphQL subschema, merging types as necessary. Schema stitching therefore creates a new schema that can be served by any GraphQL server. Configuration allows any subschema to be merged on the gateway, although certain patterns of subschema development may require less configuration.

(3) I am not familiar with how Federation handles scalars, but based on above, schema stitching handles them perfectly fine no matter what the internal representation is, as the subschemas are all perfectly valid GraphQL schemas, and the gateway schema is also a canonical GraphQL-JS GraphQLSchema.

(4) In fact, I spun up a demo using graphql-upload in which files are proxied as streams via the gateway schema to the subschemas (https://github.com/ardatan/graphql-tools/blob/be1a1575bb299b64993e81bca78cd053789cfee0/packages/links/tests/upload.test.ts)

(5) The only "trick" required to make this work is that the executor for a subschema needs to replace the individual upload scalars (which are promises resolve to an object from which the stream can be extracted) with the stream itself. @jaydenseric was kind enough to make the createUploadLink configurable with custom options that allow this to be easily done when combined with a custom Link that awaits the results of all those promises.

(6) Well, the other trick is that there is a bug/limitation in the more widely-available FormData server polyfill or a bug/limitation within the node-fetch v2, depending on how you look at it, that does not let FormData stream streams. The example above uses a custom FormData polyfill as a workaround, but there is an easier workaround with node-fetch v3. See https://github.com/form-data/form-data/issues/394#issuecomment-593645033.

(7) In any case, my demonstration relies on a custom scalar that simply allows GraphQLUpload variables to be serialized. Here it is:

const GraphQLUpload = new GraphQLScalarType({
  name: 'Upload',
  description: 'The `Upload` scalar type represents a file upload.',
  parseValue: value => {
    if (value != null && isPromise(value.promise)) {
      // graphql-upload v10
      return value.promise;
    } else if (isPromise(value)) {
      // graphql-upload v9
      return value;
    }
    throw new GraphQLError('Upload value invalid.');
  },
  // serialization requires to support schema stitching
  serialize: value => value,
  parseLiteral: ast => {
    throw new GraphQLError('Upload literal unsupported.', ast);
  },
});

Note that the above scalar also works when used with v9 and v10, but that's completely besides the point.

The key is that serialization should just not throw, which is the current behavior....that is really unhelpful.

Anyways, if you've gotten this far, thanks for reading!

To return to where I started, this library seems to forbid serialization because there is no use case, and attempts to warn end-users that they are using the library incorrectly--but there is a use case. Serialization is usually for preparing GraphQLOutputType values, but is also can be used to reverse the parsing of GraphQLInputType values for when they need to be sent to a different service.

On the other hand, perhaps you might feel that this use case is so specialized, and so easy to work around by simply redefining the GraphQLUpload scalar, as I have done, that this is not worth changing. I happen to disagree, but I see that point as pretty fair.

I think that if you decide that you truly won't fix, you may as well close the issue, however. :)

yaacovCR avatar Nov 30 '20 08:11 yaacovCR

Above has been edited.

yaacovCR avatar Nov 30 '20 08:11 yaacovCR

@yaacovCR

The problem is that what you are trying to do is not really serialization. The custom scalar serialize method is supposed to serialize the scalar to a JSON value, which doesn’t make sense for a file upload promise/stream.

In my earlier comment:

Are you able to explain in more detail what "schema proxying" is, and how it works, for this change to be necessary?

I'm conceptually familiar with what schema stitching is; when I asked for clarification about "schema proxying" it was because I wasn't sure if that was what you were referring to as I hadn't heard it called "proxying" before.

By asking "how it works", I was hoping you would share links to specific lines of code where the schema stitching implementation tries to "serialize" the scalar. Chances are there is an obvious, elegant solution. For example:

 new GraphQLScalarType({
   name: 'Upload',
   description: 'The `Upload` scalar type represents a file upload.',
   parseValue(value) {
     if (value instanceof Upload) return value.promise;
     throw new GraphQLError('Upload value invalid.');
   },
   parseLiteral(ast) {
     throw new GraphQLError('Upload literal unsupported.', ast);
   },
   serialize() {
     throw new GraphQLError('Upload serialization unsupported.');
   },
+  extensions: {
+    // Need to decide on the best name for this…
+    serializeForSchemaStitching: value => value
+  }
 })

Then in the schema stitching implementation code, by convention it could first check for scalar.extensions.serializeForSchemaStitching and use that, or else fall back to regular scalar.serialize.

I haven't used extensions for a custom scalar yet, but this seems to be the sort of thing it’s intended for.

Something else I hoped you would answer:

as long as there are still helpful errors somehow for people misusing the Upload scalar; perhaps if they mistakenly use it as the type for fields in non input types.

With your alternative GraphQLUpload scalar that does serialize: value => value, what will happen if a user accidentally makes a schema like this:

scalar Upload

type Query {
  userCount: Upload! # Was supposed to be Int!
}

Then queries userCount? I'm a bit rusty, so not really sure off the top of my head what will happen. Would they correctly get a GraphQL serialization error, is that what happens with the official GraphQLUpload scalar?

Maybe they would get a Upload value invalid GraphQL error before that point:

https://github.com/jaydenseric/graphql-upload/blob/1398e62a53ad8075d6d4fc0692e326ceef224ca0/public/GraphQLUpload.js#L82

It's super dumb, but what if a user tries to use the Upload scalar going the other way, as a "download". What if they even construct an Upload instance for it's value so it won't error at the parseValue step.

jaydenseric avatar Nov 30 '20 10:11 jaydenseric

Hi @jaydenseric !

Thanks again to you both for engaging with me on this. I appreciate your time, it's definitely not taken for granted.

I think you are correct that I don't have a good workaround available for guarding against uses of the Upload scalar as an output value if you remove the throw from serialize. From what I recall, that's what discouraged me from replying to your earlier post, and I guess when I saw @mike-marcacci's follow-up, I didn't scroll up and see what we were up to in this discussion.

I also should have acknowledged your point instead of just not responding!

I think the workaround you have suggested is fine enough, although it does make me a little sad to have to resort to that just to still be able to guard against using the scalar as an input. We can insert a check for that within makeExecutableSchema from graphql-tools, but that will not help the wider ecosystem. From what I recall, there has been some discussion within the graphql spec working groups about how to define some types to be able to be used for BOTH input and output, and in that context it might make sense to formally restrict a type to just input, but that would have to wait for another day, probably far in the future.

Of course, I watch this repository, and so I see how many questions you get in terms of basic usage of this very popular package, so I imagine from your perspective and time, the more checks the better!

Here are those code links. They confirm that the workaround you have suggested would be viable: https://github.com/ardatan/graphql-tools/blob/294dedda609e366c60d149b194714d54210c4a03/packages/delegate/src/transforms/AddArgumentsAsVariables.ts#L148

https://github.com/ardatan/graphql-tools/blob/ae7c968deb2e6f51024a385abfc6455c0db5a5df/packages/utils/src/transformInputValue.ts#L36-L38

So, would you be interested in adding it? Or do you think the few users who would actually make use of this would be better served by just using my current workaround to override the scalar?

All the best, @yaacovCR

yaacovCR avatar Nov 30 '20 12:11 yaacovCR

Note that https://github.com/graphql/graphql-js/pull/3049 alludes to above discussion. Looks like graphql-js will start using serialize for input values as workaround for issues involving default values. This, as noted there, is not technically correct, perhaps temporary. Movement on that issue might help here as well.

yaacovCR avatar May 06 '21 13:05 yaacovCR

@yaacovCR thanks for your comment there, glad to have you on the case :)

I had a look to see if there was anything actionable on this issue to include in our coming major release, but it seems to be in a state of flux so we might just ship this version and look at a permanent solution based on what GraphQL.js comes up with in a following release.

jaydenseric avatar May 07 '21 00:05 jaydenseric

Hi there!

Sort of came upon this issue because I got the Upload literal unsupported. exception when using a delegateToSchema from graphql-tools and I guess somewhere in the depths of that one serialization happens... Anyway, for for now I use a custom scalar with serialization sort of allowed (serialize: value => value) as mentioned before (thanks by the way! @yaacovCR ).

But my lazy +1 for this anyway. :o)

Songworks avatar Jul 16 '21 14:07 Songworks