build icon indicating copy to clipboard operation
build copied to clipboard

Allow reading outputs from a previous run

Open Rexios80 opened this issue 5 months ago • 14 comments

For the hive_ce_generator, I am generating a schema file to keep track of type ids and field indices. I need to be able to read this file and output it with changes. In the current implementation of the build_runner this seems to be impossible since all expected outputs are immediately deleted.

For now I am using dart:io directly, which will become a problem if https://github.com/dart-lang/build/issues/4094 ever goes through.

Rexios80 avatar Jul 21 '25 19:07 Rexios80

Thanks for filing.

The problem with allowing reading previous outputs is that it makes the build no longer reproducible: you can get a different result every time you build, because it can change based on the previous result.

Still, I hope I can help figure out a good way to do it :)

Why not write the whole file each time? Is it a matter of performance?

davidmorgan avatar Jul 22 '25 12:07 davidmorgan

Why not write the whole file each time? Is it a matter of performance?

The content of the old file is used as input to determine the content of the new file. The generator needs to know what classes were assigned which type ids and which fields were assigned which indices.

Rexios80 avatar Jul 22 '25 12:07 Rexios80

Ah, for serialization compatibility?

I think the right way to think of that is as an input.

For example proto buffer definition files require you to write a "tag number" on each field:

https://protobuf.dev/programming-guides/proto3/

so you could just require people to write them everywhere :)

For convenience though it makes sense that you want them to be generated.

I think a good way to do that would be this new feature

https://github.com/dart-lang/build/issues/4002

--the idea being that generators can suggest changes to user source.

So for example the generator could say "here are all the new tags" and then you are either configured to accept changes automatically or you get a chance to choose "accept change".

Without a new feature I don't think there's a supported way to do exactly what you're doing; serializers sometimes accomplish the same thing with names or with hashes but both are different to incrementally-assigned IDs, obviously :)

davidmorgan avatar Jul 22 '25 12:07 davidmorgan

Some from of persisted state would be cool though.

I could think of some use-cases for it. For example, one generator could list all the DB tables in the app, and generate a migration for it. And if a user removes a DB from their code, the only way for the generator to know it has been deleted would be to read the old generated code and do the diff between old/new list.

Although I'm not sure how good the experience would be there, since it's unclear when a dev would "commit" a change. And we wouldn't want to migrate stuff that has never been shipped to begin with.

rrousselGit avatar Jul 22 '25 12:07 rrousselGit

Some from of persisted state would be cool though.

I could think of some use-cases for it. For example, one generator could list all the DB tables in the app, and generate a migration for it. And if a user removes a DB from their code, the only way for the generator to know it has been deleted would be to read the old generated code and do the diff between old/new list.

Although I'm not sure how good the experience would be there, since it's unclear when a dev would "commit" a change. And we wouldn't want to migrate stuff that has never been shipped to begin with.

Hmmm if the generator generates a script, you run the script and it updates both DB and a schema ... that way the schema can be an input.

davidmorgan avatar Jul 22 '25 12:07 davidmorgan

so you could just require people to write them everywhere :)

This generator was created specifically to remove the need for annotations on every type and field. The removal of this constraint allows you to generate on classes outside of your control, such as classes generated by the OpenAPI generator, where you do not have the ability to add annotations.

The file in question does exist to allow for a sort of "migration" when modifying types and fields.

And we wouldn't want to migrate stuff that has never been shipped to begin with.

This is something I've thought about a lot. For my use-case I believe it is on the developer to understand when they should commit to their schema changes. Since my "migrations" are handled by a single generated file I believe this is perfectly reasonable. If it was any more complicated I would agree this is bad UX.

Rexios80 avatar Jul 22 '25 13:07 Rexios80

if the generator generates a script

This is actually a great idea, but then we have to rely on developers remembering to run the script

Rexios80 avatar Jul 22 '25 13:07 Rexios80

so you could just require people to write them everywhere :)

This generator was created specifically to remove the need for annotations on every type and field. The removal of this constraint allows you to generate on classes outside of your control, such as classes generated by the OpenAPI generator, where you do not have the ability to add annotations.

The file in question does exist to allow for a sort of "migration" when modifying types and fields.

And we wouldn't want to migrate stuff that has never been shipped to begin with.

This is something I've thought about a lot. For my use-case I believe it is on the developer to understand when they should commit to their schema changes. Since my "migrations" are handled by a single generated file I believe this is perfectly reasonable. If it was any more complicated I would agree this is bad UX.

Yes, that makes sense.

So I think #4002 on that single generated file would do what you need?

davidmorgan avatar Jul 23 '25 07:07 davidmorgan

That sounds like it might work as long as the file isn't treated as generated and thus deleted on the next run

Rexios80 avatar Jul 23 '25 11:07 Rexios80

Although running the generator kind of is the migration so I'm not sure about requiring user action to update the schmea after the migration is already complete. This could lead to corrupted data.

Rexios80 avatar Jul 23 '25 14:07 Rexios80

How about having buildStep.writeAsString accept an optional signature or digest parameter? Then, in future builds, we could use something like buildStep.readCachedAsString(id, signature) to retrieve a file from a previous build, ensuring determinism by requiring the signature to match for the cache to be valid. This would also allow each build to customize how cached assets are reused.

gmpassos avatar Jul 23 '25 22:07 gmpassos

How about having buildStep.writeAsString accept an optional signature or digest parameter? Then, in future builds, we could use something like buildStep.readCachedAsString(id, signature) to retrieve a file from a previous build, ensuring determinism by requiring the signature to match for the cache to be valid. This would also allow each build to customize how cached assets are reused.

I'm not sure I follow, how would the builder know the signature?

davidmorgan avatar Jul 24 '25 06:07 davidmorgan

I'm not sure I follow, how would the builder know the signature?

The build framework simply stores the signature provided by the builder. Each builder implementation must define its own rules for computing the signature based on its specific inputs. For each build step, instead of only generating and writing the asset, the builder can compute the signature and check whether a matching cached asset already exists. This cached asset can be reused to avoid regeneration, or another cached asset can be used as input to compute the current asset.

For example, in the reflection_factory package, comments and method bodies don't affect the generated code, so it may be useful to generate a signature based only on top-level classes, method names, and parameters to avoid unnecessary regeneration. Also, only classes annotated with @EnableReflection() or @ReflectionBridge() trigger reflection code generation, so any other code not related to these annotated classes is ignored.

gmpassos avatar Jul 24 '25 20:07 gmpassos

I'm not sure I follow, how would the builder know the signature?

The build framework simply stores the signature provided by the builder. Each builder implementation must define its own rules for computing the signature based on its specific inputs. For each build step, instead of only generating and writing the asset, the builder can compute the signature and check whether a matching cached asset already exists. This cached asset can be reused to avoid regeneration, or another cached asset can be used as input to compute the current asset.

For example, in the reflection_factory package, comments and method bodies don't affect the generated code, so it may be useful to generate a signature based only on top-level classes, method names, and parameters to avoid unnecessary regeneration. Also, only classes annotated with @EnableReflection() or @ReflectionBridge() trigger reflection code generation, so any other code not related to these annotated classes is ignored.

You can do this today by splitting into two builders: the first one pulls data from the source and writes it to a (hidden) file, the second one reads only that file and does generation. build_runner will then do what you're describing about only rerunning the second step if the data from the first step is unchanged.

But, it usually won't help: usually it's the analysis that takes time, not generating code once analysis is done.

I do plan to spend time looking at popular builders and figuring out if they can be made faster--in some cases this might be the answer. But mostly what I'm expecting to see is shallow issues, e.g. there was a dartfmt issue with the old formatting style that it would be very slow when formatting some particular patterns of code. The fix there is to switch your builder to use the new formatter style :)

davidmorgan avatar Jul 25 '25 06:07 davidmorgan