Vendoring of foreign source trees with leading underscores
Following up on https://github.com/ocaml/dune/issues/4795, dune is still lacking convenient way to manage vendored foreign source trees that might include underscores.
I'm currently vendoring a bunch of Rust crates into my OCaml project and use the following stanza to depend on it in the rule, invoking cargo: (source_tree ../vendor). After adding new dependencies, cargo started to complain about missing files, after quick research (good thing I know dune has some oddities regarding underscores!) I found the issue linked above and applied a fix in my dune file:
(vendored_dirs vendor)
(subdir vendor/futures-core/src/task/ (dirs :standard __internal))
This behavior with underscores is certainly confusing, and lack of decent workaround is disappointing.
Indeed, Dune by default ignores any directory starting with _. I don't remember if there is a technical reason to do so. One directory we want to ignore for sure is _build, but perhaps we don't need to ignore every other directory starting with _ by default (althought I bet there are builds in the wild that depend on this behaviour).
@Lupus how would you improve the current solution for changing the directory mask?
If there is no need to ignore anything else than _build, I would probably just stop doing that. It's a higly confusing impilit behavior.
What about directories that stard with a .? We also ignore those by default.
I've also got bitten by that when tried to tell dune that I need my .cargo dir in the build environment as a dependency. It was just silently ignored when I had it explicitly written in a rule, no errors, happy debugging. Maybe there should be some stanza to specify which source trees should include all that's found inside verbatim without any filtering. data_only_dir might be that stanza actually, as cargo sources are just data for dune, as well as cargo configs in .cargo folder.
Another possibility would be to only change the default glob for subdirectories of those specified in vendored_dirs.
Right now I use vedored_dirs to mark my cargo vendor folder, copied that from some of the templates for cargo integration available online. That template also uses source_tree stanza to depend on the whole vendor folder in rules, invoking cargo. So yeah, vendored_dirs not ignoring some files because of . or _ would work for me, not sure if it might break something, as vendored_dirs are also used for vendored dune projects.
Another possibility would be to only change the default glob for subdirectories of those specified in
vendored_dirs.
As clever as that may sound, it's going to surprise someone else down the line and that someone will create a ticket very similar to this one one day.
I think removing all the globs from the default set of patterns is the right first step. We can ignore _build, _opam, .git,but nothing like [._]*
Sounds good to me!
Is there anyone working on the fix for this? If not, is it possible to get some guidance if the feature given above is something wanted?
If not, is it possible to get some guidance if the feature given above is something wanted?
Yes, I think a PR in this direction would be accepted. The change probably needs to be versioned though.