graphql-core icon indicating copy to clipboard operation
graphql-core copied to clipboard

GraphQLSchema serialization using pickle

Open rpgreen opened this issue 3 years ago • 17 comments

Is it possible to serialize an instance of GraphQLSchema?

We would like to dump and load an instance of GraphQLSchema to/from a file for performance reasons.

It does not appear to be serializable using pickle by default.

>       res = pickle.dumps(schema)
E       AttributeError: Can't pickle local object 'extend_schema_impl.<locals>.build_object_type.<locals>.<lambda>'

Thanks in advance.

rpgreen avatar Jun 02 '22 15:06 rpgreen

It's currently not supported. You can serialize a schema to SDL using print_schema. But you can't pickle a schema because of the resolvers functions and also because GraphQL-core uses lambdas a lot, and these can't be pickled.

Will keep this open as a reminder to maybe/someday investigate whether supporting this would be feasible somehow without totally reinventing GraphQL-Core. If anybody wants to work on this or has comments or ideas how to solve this, please feel free to jump in.

Cito avatar Jun 02 '22 18:06 Cito

@rpgreen Out of curiosity, where do you expect performance gains when serializing your schema?

erikwrede avatar Jun 02 '22 18:06 erikwrede

We are using build_schema() at runtime and there's a significant performance hit (several hundreds ms up to several seconds)

I assume it would be much faster to rehydrate a serialized schema object if that was possible.

rpgreen avatar Jun 02 '22 18:06 rpgreen

@rpgreen Interesting. Maybe it would be worthwile then to profile why/where build_schema() is so slow and whether there is potential for optimization. First step would be to find out whether it's the parse() step or the build_ast_schema() step of build_schema() that is the main problem.

Cito avatar Jun 02 '22 18:06 Cito

I did a cProfile profiling run on the current GitLab GQL schema. You can easily download it using Altair, selecting https://gitlab.com/api/graphql and running Export SDL in the docs section.

These are the Insights: The Schema File has 51k lines, including blanks and lots of documentation.

Building the schema took a total of 3.8 seconds using cProfile on python 3.10, of which 2.67s were spent on build_ast_schema, and 1.16s on parse. In total, 52k nodes were parsed. Most of the building time (2.4s) are spent during visit, with the nested calls taking 2s and the function itself taking 430ms:

image

These are the top bottlenecks (see Own Time): image get_visit_fn is by far the largest bottleneck (by accumulated time), however, it is called quite frequently. __init__ is the init function of Node, where the **kwargs are traversed and transformed into FrozenList. Inside of parse I couldn't find similar bottlenecks, but I'm not familiar with the code of that part of the library (yet). Maybe (pure speculation) there is a way to optimize the amount of get_visit_fn calls (1.3 million for 52k Nodes, but there are multiple types of visit functions per Node), but that would probably require significant rewriting of the entire build_schema part.

Maybe this helps someone more familiar with the internals of core to identify potential issues. Most of the long durations here simply seem to be caused by a large number of nodes in the schema, not significant problems with the parsing/validation logic.

Keep in mind that this analysis is based on a quick cProfile iteration, do not take this as scientific evidence 😉

erikwrede avatar Jun 02 '22 21:06 erikwrede

Thanks @erikwrede for the profiling. Did you use the latest version (3.2.1) of GraphQL-core? I'm asking because there have been some speedups and get_visit_fn is actually deprecated.

Cito avatar Jun 04 '22 19:06 Cito

@rpgreen I have looked into this a bit more. It seems you want to pickle only schemas that have been produced from SDL, which makes this issue a bit more restricted and easier to tackle.

The reason why this currently does not work is the implementation in extend_schema.py which heavily relies on lamba and local functions, which cannot be pickled. Refactoring this is probably not so easy, and would also deviate from what GraphQL.js is doing, which also uses anonymous and local functions. If anybody has ideas how to solve this, please comment.

Also, I'm still not sure how much faster pickling really would be, and whether improving the performance of build_schema() would not be more worthwile.

Cito avatar Jun 05 '22 21:06 Cito

Btw, I have added a test for pickling schemas here which is currently skipped.

Cito avatar Jun 05 '22 21:06 Cito

Just checked, seems like I messed up the version in my test env. The new version sees a performance improvement roughly equal to removing get_visit_fn from the statistics, parsing now takes 55% of the time instead of a third of the total run time. Might look into this further soon. Maybe there is some improvement potential in the DFS of visit, 400 ms own time for 50k nodes seems like a lot for a DFS (even using a profiler which is slowing things down).

erikwrede avatar Jun 05 '22 22:06 erikwrede

@rpgreen @erikwrede I have now created a branch pickle-schema that allows serialization of a schema built from SDL. It may be a bit slower when building schemas, as it uses a class instead of local functions and therefore needs attribute lookups, but it does not seem to be a big difference. Please check it out and give me feedback whether you think this should go to the main branch.

Cito avatar Jun 06 '22 14:06 Cito

@Cito @erikwrede Thanks for looking into this. We will try this out and share our results.

From our perspective, we would be fine with a higher one-time cost of building the schema as long as we have a way to quickly serialize/deserialize at runtime.

rpgreen avatar Jun 06 '22 14:06 rpgreen

Hi folks, We've done some prototyping with this and we are seeing an improvement. Can we get this merged into the main branch?

rpgreen-vendia avatar Jul 25 '22 12:07 rpgreen-vendia

@rpgreen-vendia thanks for the feedback. Will consider this when working on the next version. Please have some patience as I cannot work on this full time and am currently deep in other projects. Feedback from others is also appretiated.

Cito avatar Jul 25 '22 12:07 Cito

Hi there, Thank you for working on this feature. I made a POC with the pickle-schema branch and I'm seeing major performance improvements with this change but there are also some issues that have come up.

  1. I see that graphql-core has its own implementation of deepcopy that was not updated as part of the pickle-schema branch. Currently the issue I get when I try to deepcopy a previously pickled schema is
>               raise TypeError(
                    "Schema must contain uniquely named types"
                    f" but contains multiple types named '{type_name}'."
                )
E               TypeError: Schema must contain uniquely named types but contains multiple types named 'String'.

../.venv/lib/python3.9/site-packages/graphql/type/schema.py:258: TypeError
  1. When doing the introspection query, the introspection result omits the types field for a previously pickled schema. Sample code below:
compiled_gql = build_schema(gql_schema)
pickled = pickle.dumps(compiled_gql)
unpickled = pickle.loads(pickled)
result = introspection_from_schema(unpickled)
resource_path_2 = Path("/Users/mmeyers/some_directory")
with open(resource_path_2 / "introspection.json", "w") as to_file:
     to_file.write(json.dumps(result))

Please let me know if I can give more detail on any of this and/or if I'm doing any part of this wrong. Thanks again!

Mika

mika-vendia avatar Aug 18 '22 00:08 mika-vendia

Thanks for the feedback @mika-vendia . Will look into this.

Cito avatar Aug 18 '22 10:08 Cito

I have started working on this issue again. First step was to update the pickle-schema branch by merging in the latest changes from main, because I plan to eventually release this as v3.3.

@mika-vendia I have also looked into your point 1 (deepcopy of a pickled schema). I was able to create a unit test reproducing the problem. The cause is clear - the unpickled standard scalar type objects are different from the original ones. I have fixed this now by replacing them with the original ones when creating the schema. The unit test passes. Maybe you can check out if point 1 is also solved for you with the current pickle-schema branch.

Will continue to work on this later this week. Maybe there is a better way of solving this, more fundamentally by somehow guaranteeing that singletons are preserved in the pickle/unpickle process. Will also look into your point 2.

Cito avatar Sep 25 '22 21:09 Cito

Thank you, @Cito! I will test out issue 1 and let you know how it goes.

mika-vendia avatar Sep 26 '22 19:09 mika-vendia

@Cito - @mika-vendia has confirmed that the fix for the first issue is working as expected. Let us know if you need any more details on the second issue.

danvendia avatar Oct 20 '22 15:10 danvendia

@danvendia I have solved the issue now a bit more fundamentally in this branch and extended the tests. Should still be working. Wil try to tackle the second issue next week.

Cito avatar Oct 23 '22 21:10 Cito

@danvendia Update: I have looked into the second issue, can reproduce it and have an idea how to solve it, even more fundamentally than the last fix. Will continue to work on it this week.

Cito avatar Nov 01 '22 22:11 Cito

@Cito - awesome! Thanks for the update!

danvendia avatar Nov 01 '22 22:11 danvendia

@danvendia The complete fix has now been merged into the main branch and will be available with the next alpha release. Please open a new issue if there are still any problems with this.

Cito avatar Nov 02 '22 15:11 Cito

@danvendia @mika-vendia @rpgreen-vendia: This is now available in v3.3.0a2 for testing.

Cito avatar Nov 03 '22 21:11 Cito

Just came by to say you rock @Cito :guitar:

Ambro17 avatar Nov 04 '22 13:11 Ambro17