ferry icon indicating copy to clipboard operation
ferry copied to clipboard

Ferry Generator 2.0

Open knaeckeKami opened this issue 11 months ago • 15 comments

Now that we have clarity around macros, it's time to start discussing the next iteration on ferry_generator.

ferry_generator has the following issues that I'd like to solve

  1. problems with nesting fragments #610
  2. generally, slow code generation, due to how built_value puts all Serializer instances in a single Serializers object, making all serializers becoming dependent on each other - slowing down build_runner https://x.com/mraleph/status/1778517432610771380
  3. code size - all these built_value classes lead to a lot of generated code
  4. We don't use Dart enums, even though they would be powerful enough by now #617

I don't think we could solve 2. and 3. without dropping built_value. 1. is a different issue, but unfortunately a fundamental flaw in the code generation logic, which would need to require a large portion of the generator to be rewritten.

I propose a new iteration of ferry_generator that does not depend on built_value.

Goals:

  • Users should still use the current code generation, ferry does not any strong assumptions on how the OperationRequest classes are generated (in fact, you could even write your own instances or use JsonOperationRequest without generating any code) Users will not be forced to migrate to use new versions of ferry.

  • If possible, users should be able to use both versions of ferry_generator in parallel - so migration does not have to be all at once. I think this should be doable by pointing the generators in build.yaml to the right files

  • Bugs with nesting fragments and interfaces are fixed

  • Code generation should be significantly faster

  • I want to investigate if using build_runner at all is necessary. We mainly parse graphql files, we don't need the features of build_runner a lot. It might be possible to do code generation optionally with a simple CLI tool.

  • generating ==, hashCode and copyWith() methods should probably be optional. If you don't need these, no reason to generate them

Feedback is welcome!

knaeckeKami avatar Feb 01 '25 20:02 knaeckeKami

I`m voting for using of build_runner because all other code generators runs that way. So, I prefer to get all that stuffs by one command. I would like to see version without built_value!

Is it possible to use freezed for generation of models?

Thank you!

alexandrim0 avatar Mar 13 '25 20:03 alexandrim0

@alexandrim0 freezed is also pretty dated with a lot of similar limitations to built_value. Both were made long enough ago that they have a lot of hacks to compensate for Dart limitations that have since been resolved. Notably, when and maybeWhen were just workarounds to deal with limitations of Dart switches at the time that have since been largely resolved by sealed classes and switch expressions.

If they were to still use build runner, and to still use a pre-existing data class/serialization package, I'd recommend dart_mappable which implements data classes in by far the most "dart idiomatic" way of all the available options (mostly by virtue of it being the newest and having been built after a lot of the modern Dart features like sealed classes, enhanced enums, switch expressions, etc).

My general 2 cents is regardless of which approach they take is: I'd love to be able to interact with Ferry-generated classes in as close to a fashion as possible to how a developer would interact with comparable hand written dart code. More specifically, I want to use switch expressions without _ default cases on all GQL enums and polymorphic classes. You could still provide when and maybeWhen, but they shouldn't be obligatory as they more or less are today with the built_value based generator.

I want this because:

  1. Switches are what I use in the rest of my code base and I want consistency
  2. Switches are much more powerful and expressive than when and maybeWhen
  3. IDE autocomplete and static analysis handle (proper) switches much better

Some blockers to using switch with Ferry enums and polymorphic classes today (the first two were already mentioned in the issue description):

  1. built_value enums aren't real enums
  2. Classes have painfully long type names that make switches on them impractical unless you use a fragment, but polymorphic fragments are bugged
  3. Polymorphic types aren't sealed classes so you don't get autocomplete or proper exhaustivity checking
  4. Polymorphic subclasses rely on when's orElse case for client-unknown types, there needs to be an UnknownFoo class or something for each superclass so I can write switches without default cases and still handle unknown types

caseycrogers avatar Mar 27 '25 17:03 caseycrogers

freezed gets better and already supports sealed classes. It drops when and maybeWhen also :-) But I have no words against dart_mappable. And I would like to see evolution of ferry `cause I relay on it.

But I doubt if my skills able me to offer PRs...

So, thank you!

alexandrim0 avatar Mar 29 '25 15:03 alexandrim0

I don't think I'll use freezed.

freezed uses json_serializable, so that would be 3 layers of code generation stacked on top of each other. What a mess that would be to debug! And code generation time would naturally also suffer.

Plus, to support setting fields to null, freezed generates a lot of code. For example, for every freezed class, it generates 4 classes with all the fields just for copyWith().

Not to badmouth freezed, I think stuff like this does not matter that much for typical for model classes in a typical mobile app, but ferry has users that have schemas with many 10k lines of code and hundreds of queries, and the amount of code that is generated strains the analyzer and bloats the code size of the app.

I have heard requests to omit the generation of ==, 'hashCode', copyWith, etc. methods to keep the code size smaller (and not to strain the analyzer as much), which I would like to support in the next version of the generator.

If feasible, I would like to try to just have one pass of code generation.

knaeckeKami avatar Mar 29 '25 16:03 knaeckeKami

I`m voting for using of build_runner because all other code generators runs that way. So, I prefer to get all that stuffs by one command.

I will definitely support build_runner! However, I'm also exploring if it makes sense to allow running code generation independently of build_runner, since it can introduce significant overhead.

knaeckeKami avatar Mar 29 '25 17:03 knaeckeKami

I want to use switch expressions without _ default cases on all GQL enums and polymorphic classes.

Yes, we could use proper Dart enums and make selection classes with type conditions sealed, but keep in mind that the existing clients should not break when the graphql schema evolves and new types are added. So some kind of fallback mechanism would be needed (like the gUnknownEnumValue fallback in the current version)

knaeckeKami avatar Mar 29 '25 22:03 knaeckeKami

Yep I understand the need for a fallback type to handle client unknown subclasses.

I just don't want to have to use a default case/wildcard pattern because that undermines static analysis-I lose exhaustivity checking.

caseycrogers avatar Mar 29 '25 23:03 caseycrogers

This is an issue for us as well, the current single run on our project takes 6-7 minutes just for this.

  • Instead of freezed, it could be a simple typedef and records, as they are immutable (we actually thinking about shifting to a code first approach, where instead of using the documents, we only use the schema and create a document builder class from that to write queries, it's just a concept for now)
  • My take on built_runner as at the moment we can't do partial runs, nor have narrow outputs, we don't use the AST gens in our code, but it's still there and its easy to import it by mistake, could be replaced with a minimal CLI package
  • We also encountered that the IDE extensions around gql does not match the behavior of the #import syntax here (package: imports works in ferry, but the rest of the world does not handle that :D )

Just to note on this, we actually don't need most of the magic, just the exec layer and the types, and for production the gql strings which atm are built and not just raw strings.

We also wrapped the execution layer, it would be nice to allow the end users to add links to a specific points. (For example we have a terminating and responding link before the request link itself, which we only able to do if we create our own TypedLinkWithCacheAndRequestController

davidfoldiotp avatar Apr 24 '25 06:04 davidfoldiotp

Something came up today at work: We're increasingly using the descriptions in the GraphQL schema and finding it a valuable resource. It'd be especially helpful if descriptions from the graph were automatically copied into generated files as Dart Docs so that developers interacting with Ferry generated graph objects could see and read schema descriptions in their IDE (eg when hovering over an attribute, VSCode will show them any dart docs for that attribute).

This of course isn't practical while using Freezed, but I'm hoping moving to a custom build runner would make this a pretty easy feature to add to Ferry Generator 2.0!

caseycrogers avatar Apr 30 '25 13:04 caseycrogers

Yes, records + typedefs are one possible way to go, but they have a big drawback: there are no subtype relationships with records.

This would make working with unions / interfaces or even reused fragments awkward.

knaeckeKami avatar May 03 '25 21:05 knaeckeKami

Thanks for the detailed discussion everyone! I've been following along and wanted to share that I've been experimenting with some of these concepts recently.

I've been playing with dart_mappable in a prototype implementation of a ferry generator 2.0 and it seems to align well with the modernization goals outlined here. It handles the modern Dart type system nicely and provides good serialization options.

Since fragment implementation issues have been addressed recently, I'm particularly interested in the infrastructure side of things - how are people envisioning this to be organized? Keeping backwards compatibility seems easy, if we version any new gql library as a 2.0, but how should issues etc. be organized?

Specifically:

  • How do we balance supporting both build_runner integration and potentially a faster standalone CLI? (I did build a CLI in my first prototype since build_runner didn't give us much, but we want backwards compatibility, right?)
  • What's the preference for organizing the codebase - separate packages for each component or a more monolithic approach with clear internal boundaries?
  • For existing users, what would be the ideal migration path to minimize disruption?

Happy to contribute to this effort and excited to see where it goes!

caffeineflo avatar May 06 '25 19:05 caffeineflo

I've never used dart_mappable, but heard only good thing of it!

if we version any new gql library as a 2.0, but how should issues etc. be organized?

I originally envisioned releasing a new library, rather than a new major version. That could help users with large codebases do a gradual migration. If there are any drawbacks to this, I'm open to abandoning it, but atm I don't see any big ones.

How do we balance supporting both build_runner integration and potentially a faster standalone CLI? (I did build a CLI in my first prototype since build_runner didn't give us much, but we want backwards compatibility, right?)

I think if we use dart_mappable, it makes sense to just stick with build_runner, since dart_mappable requires build_runner anyway (I think?)

What's the preference for organizing the codebase - separate packages for each component or a more monolithic approach with clear internal boundaries?

I think a single package (with a good internal structure) with one builder is enough. gql_build uses different builders for different aspects, but that leads confusion of users when setting up their build.yaml, and required duplicate configuration.

knaeckeKami avatar May 10 '25 23:05 knaeckeKami

Here is my take on this (+ side note: we are pushing internally to contribute to open source projects, so we can provide feedback with a pretty large project at hand, or somekind of paid support from you as well)

I took on a look whats being actually used:

  • The GReq, GData, G*Vars, the generated output is actually mostly not needed in the shipped product, we only need Vars typed to make the requests, so allowing end users to narrow these down to their use case (what to generate in this case) would be awesome
  • The response types used mostly in 1 layer, we try to avoid passing the generated code around the code paths, this is not just because the architecture, but the fact that it's hard for most of the developers to use GSomeResponse_nested_nested_nested parts to be a valid name (hence we mostly map these to an other immutable structure records)
  • We did tinker with running a code generator as I mentioned before, the package:gql is pretty sufficient in a whole for that with the visitor pattern a package:code_builder was an amazing fluid apit to use on top of that to generate the structures we needed (it's not finished but it did reduce our whole generator run significantly, however I would advise to look into a IR cache, parsing the files and the schema could be based on it's hash or a watcher functionality, and regenerate the IR in that case, and then regenerate the necessary code parts based on that (look at the fragments now 👀)

For the migration part:

  • The separate package would be a must on our end, having the same package with 2 versions would be a nightmare :)
  • Since we would not need to touch the .gql files, the only part is the Req/Vars/Data and Responses (which if the structure is intact then should be easy) so a codemod (or autofix) could replace the requests with a new one I guess, separating the new and old with different naming would make it clear that we are on a different code path for that request which would be also nice to reduce mental overhead, this means that its mostly up to keeping the request's object structure at least for the migration phase in somewhat stable state

For the questions:

  • How do we balance supporting both build_runner integration and potentially a faster standalone CLI? (I did build a CLI in my first prototype since build_runner didn't give us much, but we want backwards compatibility, right?) I don't think you need to, if the migration can happen in parallel, so we can use new generated code and the old one until we phase out. That would probably require some work in the exec package or a new version which introduces a common interface which the exec layer accepts for compatibility. It's a good question overall!

  • What's the preference for organizing the codebase - separate packages for each component or a more monolithic approach with clear internal boundaries? Monolithic packages could be intimating to read/think trough (but if it's well organized that could help), I think both of them are fine, it's easier to start with a monolithic and cut that to pieces i guess :)

For existing users, what would be the ideal migration path to minimize disruption?

  • I talked about this above, but for us: Add a new dependency -> configure it to process specific requests (file), groups (folders) and how they are processed (generates everything, only responses or vars and responses etc -> use an auto fix for replace the old code (or just a linter warning maybe) -> adjust our own code to the changes (if generated structure is the same, most probably only type references needs be changed) -> build and repeat :)

regarding build runner, it's probably a possibility to create a runner which does nothing but forwards the generator part to the custom one, keeping everyone happy :)

davidfoldiotp avatar May 11 '25 08:05 davidfoldiotp

regarding build runner, it's probably a possibility to create a runner which does nothing but forwards the generator part to the custom one, keeping everyone happy :)

That might actually be a bit tricky, since build_runner tries to cache outputs, and one just forwards to the custom generator and don't read the graphql files with the built_runner API, it will assume that there are no changed inputs so it would always use the cached outputs.

knaeckeKami avatar May 14 '25 12:05 knaeckeKami

I'm a fan of freezed and discovered built_value with ferry and well it's...quite filthy. Fortunately I'm working on a pretty small app so I don't have the perspective you have about freezed limitation.

woprandi avatar Aug 21 '25 08:08 woprandi