dune icon indicating copy to clipboard operation
dune copied to clipboard

Add include of generated files

Open bobot opened this issue 3 years ago • 27 comments

This branch tries to add features needed for #3901 . Here my first difficulties:

  • The source tree is context independent, e.g what is a vendored directory doesn't depend on the Context.t.
    • I think it is a good thing, but it means that generated included files should not include (vendored ...) stanza since the file is generated in a context
  • The parsing of all the dune files is done in one call Dune_load.load for all the directories. So the generation of the included files creates a cycles with all the dune files, so with the inclusion of their content. Two possibilities:
    • Separate the loading of the dune files by directory.
    • Include the files later, for example when the rules are evaluated, it could restrict even more the possible stanza but it would change less things.

Currently instead of extending the (include file) stanza I added (include_generated file) in order to simplify the developement of the feature. And for the user it helps to separate the two cases.

bobot avatar Oct 18 '21 09:10 bobot

Currently instead of extending the (include file) stanza I added (include_generated file) in order to simplify the developement of the feature. And for the user it helps to separate the two cases.

That seems reasonable. Perhaps even guard the feature under a (using include_generated 0.1) in the dune-project file? So that we don't have to commit to stability and versioning too early.

ghost avatar Oct 19 '21 10:10 ghost

I'm going to dissent against having a separate stanza. There's a couple of reasons why I don't like it:

  • We aren't offering anything new conceptually, only removing an existing limitation. Why should we have 2 separate stanzas for the same construct in the language?
  • It is a bad UI for end users. Now they'll need to churn their dune files from include generated sources.

Perhaps even guard the feature under a (using include_generated 0.1) in the dune-project file? So that we don't have to commit to stability and versioning too early.

This can be done without a separate stanza. It seems simple to guard the feature behind a Path.Source.is_in_source_tree check when it's disabled.

rgrinberg avatar Oct 19 '21 19:10 rgrinberg

This can be done without a separate stanza. It seems simple to guard the feature behind a Path.Source.is_in_source_tree check when it's disabled.

I agree with these arguments.

nojb avatar Oct 19 '21 21:10 nojb

Do you still prefer the same stanza if one accept to include stanzas that the other do not? If people prefer the same stanza, I will move to it once the other questions are resolved.

Currently I'm a little at loss to find the right place or right way to break the circular dependency. Before Super_context would be great so that the include_generated is transparent, but the rules are generated by directory later. So it should be in Gen_rules but we already filtered stanzas, the stanzas are provided in the API of Super_context, and there is the add stanza in subdirectory. Do you have some ideas?

Do we have a mode in the build system, where the need to build something would be an error? It is for dune install it would need to check that includes generated are up-to-date but not build them. However stanzas that install are not the priority of #3901 so they can be forbidden in generated includes.

bobot avatar Oct 20 '21 07:10 bobot

You might need to ignore include_generated (or just include with a missing source file if we use the same stanza) during the first loading stage. And only evaluate and interpret it in gen_rules.ml when we actually produce the rules for the directory.

ghost avatar Oct 20 '21 10:10 ghost

Eventually, we should break down Super_context BTW. This thing is ugly and pre-dates Memo.

ghost avatar Oct 20 '21 10:10 ghost

Ok I did nearly that but I had to touch all the stanzas_in related stuff because it becames Memo.Lazy.t. It should be for the better. And it works sufficiently well for #3901 (tested with rules and local executable).

The generated dune file can't add things to the library database and artefact database (public executables).

A note, in main branch include stanza are interpreted in Sub_dirs (for plain dune file) and in Dune_file (only for OCaml code dune file). The only remaining advantage of the last one are that their generated stanza can be part of the databases.

bobot avatar Oct 22 '21 11:10 bobot

Good for review

bobot avatar Oct 22 '21 18:10 bobot

The generated dune file can't add things to the library database and artefact database (public executables).

Maybe not in this PR, but do you see a way to lift the restriction about libraries? This is one of the use-cases I had in mind for this feature...

nojb avatar Oct 22 '21 18:10 nojb

Lifiting the restriction for libraries is tricky for the implementation, but already at an high level there are traps. Indeed the set of available library would potentially change at each new dune file parsed. So it could change the meaning of %{lib-available:foo}. So such replacement and other optionality tests would wait until all the dune file are parsed to return false, but could return true as soon as the library is available. So it could be possible but making such library DB would be a great feat.

There are other questions:

  • include a file generated from a subdirectory (include subdir/generated-dune) is exclusive with env stanza present in generated dune file. Indeed the generation in subdir of generated-dune need an environment which is computed by taking into account all the environnement stanza present in parents directory.
  • subdir stanza can't be present in generated dune file because it is currently computed quite early, so we can't automatically generate rules in a tests hierarchy not statically known (promotion can still be used for this use case). At an high level point of view the difficulty is to know when all the stanza of a directory are known. We could force the subdirectories to wait for parents directory. It would forbid even more the (include subdir/generated-dune).
  • Dir_status has the same difficulty than environments

The summary is that Super_context and before (e.g Subdir.t) are still too much not Memoized for having a clear picture but the high level view is that the generation of the stanzas of a directory depends on the generation of the stanzas of parents directory and only (include ../generated-dune) are acceptable.

bobot avatar Oct 26 '21 10:10 bobot

Personally, I find some things to be quite confusing about this PR:

  • The interaction between include and subdirs. Is it ok to generate subdir stanzas? Is it ok to have generated include in a subdir stanza?

  • The distinction between parse and parse_generated. Why is this necessary? I would expect a single code path.

  • How does this work with -p?

Even before this PR, the code had some serious issues IMHO:

  • The handling of include is hacky and must be duplicated
  • Untracked IO is done on source file.

I don't think we need to solve all the technical problems to make this experimental feature complete, but we should at least not make the current code even more difficult to understand.

rgrinberg avatar Oct 28 '21 04:10 rgrinberg

Personally, I find some things to be quite confusing about this PR:

I agree that the scope of what we can do with generated includes is not very clear and very implementation dependent. Mainly because the first parts of the handling of dune file are not Memo aware, and so are monolithic.

The interaction between include and subdirs.

Is it ok to generate subdir stanzas?

No it will not work because subdir stanzas are handled before, for all the hierarchy at the same time. Perhaps reworking it like source_tree and env where the computation of the sub-directories just depends on the parents directory would solve most of the problems.

Is it ok to have generated include in a subdir stanza?

Yes

The distinction between parse and parse_generated. Why is this necessary? I would expect a single code path.

I did that at first, but since the first part of parsing is monolithic, it was complicating things without any advantage. I preferred not to complicate this part before an overall.

How does this work with -p?

If an executable needed for generating an include file is marked for a package not selected, the generation will fail. Usually such generation are done using private binaries. It is an additional limitation. We could add (package ...) to include in order to limit the generation when the package is selected, but for another PR it is very minor.

Even before this PR, the code had some serious issues IMHO:

I think an overall is needed but I need inputs from the authors to understand if I understand it correctly.

bobot avatar Nov 03 '21 15:11 bobot

Meeting notes:

  • The current approach offer too much restrictions, adding some limited scope and staging would be better
  • Prefered approach:
    • Add a (scoped_dir (dir <path>) (dune_file <path>) (exports (library <name>) (executable <name>)(package <name>))
      • The directory dir is not traversed until the dune_file has been generated and parsed
      • With dune_file taken into account dirmust define the libraryand executable declared exported, and could define install stanza for package.

bobot avatar Nov 25 '21 09:11 bobot

@rgrinberg @jeremiedimino Source_tree doesn't depend on the context (or build_context), right? So it seems that adding the support of scoped_dirs which depends on the build_context since the generation of the file depends on it requires more changes if done in Source_tree. Do you have some advice?

EDIT: Why does Build_system depends on Source_tree?

bobot avatar Nov 26 '21 09:11 bobot

@rgrinberg @jeremiedimino Source_tree doesn't depend on the context (or build_context), right?

That's right.

So it seems that adding the support of scoped_dirs which depends on the build_context since the generation of the file depends on it requires more changes if done in Source_tree. Do you have some advice?

The new feature I'm working on should help (see my comment on slack). In the meantime, I suggest to focus on scoped dirs that don't depend on the build context.

ghost avatar Nov 29 '21 13:11 ghost

@christinerose we decided to abandon the feature in its current form, so it's probably not worth reviewing the current PR.

ghost avatar Nov 29 '21 13:11 ghost

@bobot why do you want the scoped_dir status to depend on the build context BTW? Whether a directory is a scoped directory or not seems more like a universal property to me.

EDIT: Why does Build_system depends on Source_tree?

For various reasons. For instance, some internal behaviours of Dune are tied to the (lang dune x.y) version, and this version is obtained via the source tree. Though it doesn't have to: it could be returned by gen_rules or another callback.

ghost avatar Nov 29 '21 14:11 ghost

why do you want the scoped_dir status to depend on the build context BTW?

The generation of the files used by scoped_dir depends on the context, so since scoped_dir would allow to generate full featured dune_files. The presence of the stanza scoped_dir but also vendor, data_dir stanzas in a scoped directory depends on the context.

bobot avatar Nov 29 '21 14:11 bobot

, I suggest to focus on scoped dirs that don't depend on the build context.

I can try to start with an implementation of scoped_dir which disallow stanzas that change the status in the scope directory (= the attempt with include generated_dune_file + subdir stanza).

bobot avatar Nov 29 '21 14:11 bobot

That makes sense. To be fair, it does feel like we are stuffing Source_tree too much. Things like the vendored status or cram tests don't have much to do with Source_tree. Maybe even the dune-project. All this could be part of src/dune_rules.

ghost avatar Nov 29 '21 14:11 ghost

Indeed, the context independent and context dependent information should be separated inside (or from) Source_tree. I'm wondering if the context independent information is just Fs_cache.

bobot avatar Nov 29 '21 15:11 bobot

@christinerose we decided to abandon the feature in its current form, so it's probably not worth reviewing the current PR.

Should I dismiss the pending reviews then, or is this for new changes?

christinerose avatar Nov 29 '21 16:11 christinerose

@christinerose thank you for your review, you do not have to dismiss them. I will use your review for the documentation of the new feature.

bobot avatar Nov 29 '21 16:11 bobot

Great, @bobot. Thanks!

christinerose avatar Nov 29 '21 16:11 christinerose

Indeed, the context independent and context dependent information should be separated inside (or from) Source_tree. I'm wondering if the context independent information is just Fs_cache.

I think you mean Fs_memo. Fs_cache is a lower-level layer used to implement Fs_memo. But agreed that Source_tree might eventually fizzle out. Or at least move to dune_rules with a slight different API, since most of it is now specific to the Dune rules.

ghost avatar Nov 29 '21 17:11 ghost

@bobot can you rebase? I'd like to take another look at this PR. I think it should be somewhat cleaner now that we've removed unmemoized file access

rgrinberg avatar May 17 '22 15:05 rgrinberg

@bobot ping, are you planning to make progress with this?

Alizter avatar Sep 02 '22 16:09 Alizter