vcstool
vcstool copied to clipboard
Add 'extends' feature to overlay repos files when importing
This adds a new extends:
root-level key for .repos
files, e.g.
# extension.repos
extends: base.repos
repositories:
a/repo:
type: git
url: https://github.com/a/repo.git
version: my-branch
The base.repos
file could look like:
# base.repos
repositories:
a/repo:
type: git
url: https://github.com/a/repo.git
version: master
another/repo:
type: git
url: https://github.com/another/repo.git
version: 1.2.3
Running vcs import --input extension.repos
would checkout a/repo
@ my-branch
(instead of master
) as well as another/repo
@ 1.2.3
.
This is motivated by ROS 2 development. As a concrete example, the master
ros2.repos
file tracks a specific ros2_tracing
tag. As a developer of that repository, I'd like to maintain a workspace based on that master
ros2.repos
file (which, for practical reasons, I don't want to modify locally), but using the master
branch of ros2_tracing
. This is why this solution uses an extension/overlay approach. Otherwise, when a new repo is added to ros2.repos
, I update my local ros2.repos
file (e.g. with git pull
), run vcs import ...
to update my workspace, and end up checking out the specific tag instead of staying on master
.
Although I haven't tested it, this should allow mixing .repos
and .rosinstall
files. Note that this means that vcstool
extends the rosinstall
format (with root-level extends:
).
Currently, an extension repos file still has to be valid by itself. Eventually, I'd like to change the import logic to only validate the result of the merge between the extension file and the base file so that the extension file doesn't need to re-define all required attributes (especially type
). However, this is for another PR.
Also, this could eventually fit nicely with #89, e.g. the extension .repos
file could provide the fork
remote on top of the original file's origin
remote.
Thank you for working on this. I can see how this feature would be valuable. Before going into the details of the PR I have a few high level thoughts:
- Atm
vcs import
only takes a single input file. Arguably it could be updated to allow passing multiple files. I would expect the semantic of that operation be the same as runningvcs import
multiple times - once for each input file - in the order they are passed. I would think this already satisfies your use case for a custom branch ofros2_tracing
. - So my main question would be what are the use cases the
extends
approach enables which aren't possible otherwise?- Is avoiding the intermediate state (if you were to follow the first bullet) worth the complexity?
- Clearly a partial repos file which e.g. only sets a different version but doesn't repeat the type and url of an entry would be something enabled by this.
- Especially in combination with #156 I could see partial files being useful.
- Are there cases where specifying more than one file under
extends
would be useful? Should there a semantic difference between nested chaining vs. enumerating multipleextends
files as siblings?
Thanks @christophebedard and @dirk-thomas for the PR. Let me add my use case here: I maintain the gazebodistro set of files. There are many packages that use as dependency a subset of the other packages available in the same repository (i.e: ign-mathX uses ign-cmakeY, ign-transportZ uses ign-mathX and ign-cmakeY). There is a lot of configurations duplicated to reflect the dependency chain on every of the major versions released. As you can imaging maintaining that amount of duplicate configurations among multiple files is quite a nightmare. If I understand the PR correctly, with this approach I could do:
# ign-transportZ
repositories:
a/ign-cmakeY:
type: git
url: https://github.com/ign/ign-cmake.git
version: Y
a/ign-mathX:
type: git
url: https://github.com/ign/ign-math.git
version: X
a/ign-transportZ:
type: git
url: https://github.com/ign/ign-transport.git
version: Z
And will save me to list again that configurations if I use:
# foo software using only ign-transportZ
extends: ign-transportZ.repos
repositories:
a/ign-foo:
type: git
url: https://github.com/ign/ign-foo.git
version: lala
- So my main question would be what are the use cases the
extends
approach enables which aren't possible otherwise?
First of all, it of course allows the user to only run one command instead of vcs import
multiple times. Although no one would actually do it, this PR allows extending .repos
files ~infinitely, and running vcs import
~infinitely sounds like a pain :stuck_out_tongue_winking_eye:
My goal was to give the extension a meaning, if that makes sense. If you were to provide a list of .repos
files to --input
, they're only ever "linked" together when you run the vcs import --input a.repos b.repos ...
command. This way, whatever name you give the extension .repos
file will always be valid, since its link to the extended .repos
file is clearly defined with extends: base.repos
.
Also, in my use case above, and I guess in a lot of similar use cases, the relationship between the extension file and the extended file isn't going to change, so why not shorten the command?
And this also allows you to still easily provide the .repos
file (in this case the extension file) through stdin
, just like before.
- Is avoiding the intermediate state (if you were to follow the first bullet) worth the complexity?
I think so.
- Clearly a partial repos file which e.g. only sets a different version but doesn't repeat the type and url of an entry would be something enabled by this.
- Especially in combination with #156 I could see partial files being useful.
Absolutely, with some changes to the parsing/validation logic.
- Are there cases where specifying more than one file under
extends
would be useful? Should there a semantic difference between nested chaining vs. enumerating multipleextends
files as siblings?
I can see that being useful, yes, e.g. if you only need to aggregate (and maybe add more repos on top) instead of needing to override versions. So the difference is aggregation (of repos) vs overriding (of versions), even if they are combined most of the time.
If I understand the PR correctly, with this approach I could do:
Yep!
And, to reference my reply above, if the extension file has a meaning by itself, since you're only combining repos here and not overriding versions, you could instead use one .repos
file which aggregates your two .repos
files (if the enumeration/aggregation thing gets added).
So the difference is aggregation (of repos) vs overriding (of versions), even if they are combined most of the time.
I would assume the following semantic:
- for aggregation / sibling files we want to make sure that the entries don't collide (either error or warn about it if they do) and
- for nested files / potential overriding we want to explicitly allow "updating" entries from leafy files by entries closer to the root input file.
(I know this isn't covered by the PR atm but I am trying to clarify where we want to go.)
- for aggregation / sibling files we want to make sure that the entries don't collide (either error or warn about it if they do) and
Indeed. In this case, the file listed first (top to bottom) could have priority.
- for nested files / potential overriding we want to explicitly allow "updating" entries from leafy files by entries closer to the root input file.
Exactly, with the order being from leaf (extended file) to root (extension file), like in the tests I've written.
Also, this starts getting more and more complex, but should "deletions" be allowed, i.e. removing a repo that is in the leaf/extended file using a special key/value in the extension file?
(I know this isn't covered by the PR atm but I am trying to clarify where we want to go.)
I should've opened an issue before the PR :stuck_out_tongue:
@dirk-thomas friendly ping. What's the next step?
Here is a thought on the approach for aggregation. Would it be easier to add a vcs client for type: vcs?
Here is a thought on the approach for aggregation. Would it be easier to add a vcs client for type: vcs?
While probably not impossible, I'm not sure that this fits vcstool
's architecture. Besides, I'm not sure what meaning a type: vcs
entry's path would have.
Sorry if I was not clear. I was suggesting a way to do recursive aggregation where a VCS input file can include children VCS input files. So a repo dependent upon a base repo or set of repos can include the dependency VCS input yaml files(). To take @j-rivero's example above:
# foo software using only ign-transportZ
repositories:
ign-transportZ.repos:
type: vcs
url: ign-transportZ.repos
a/ign-foo:
type: git
url: https://github.com/ign/ign-foo.git
version: lala
Sorry if I was not clear. I was suggesting a way to do recursive aggregation where a VCS input file can include children VCS input files. So a repo dependent upon a base repo or set of repos can include the dependency VCS input yaml files().
Sorry, I'm a bit confused, because that's kind of what this PR does. I'm just not sure if you're only talking about how we declare these dependencies in a yaml
format in a .repos
file.
We could just pre-process like I did for this PR and replace an entry of type: vcs
under repositories:
by the repositories contained in that file and do it recursively, etc. However, repeating the path and using type: vcs
seems clunky, e.g.:
extends: ign-transportZ.repos
repositories:
...
vs
repositories:
ign-transportZ.repos:
type: vcs
url: ign-transportZ.repos
...
However, if you're talking about using an actual vcs
client (like the ones here: https://github.com/dirk-thomas/vcstool/tree/2e90383a09e810049ee505e1e618690957c631dc/vcstool/clients), I don't think that would be ideal, from what I can tell. Here's an example:
# base.repos
repositories:
a/repo:
type: git
url: https://github.com/a/repo.git
version: master
another/repo:
type: git
url: https://github.com/another/repo.git
version: 1.2.3
# extension.repos
repositories:
base.repos:
type: vcs
url: base.repos
a/repo:
type: git
url: https://github.com/a/repo.git
version: my-branch
which is just the example I added to the README, but using type: vcs
to declare the extension.
Here, given extension.repos
as your entrypoint file, you have to make sure to process base.repos
before a/repo
, because the extension file modifies the original a/repo
's version:
(from master
to my-branch
). Dependencies are supported, e.g. here for nested repositories: https://github.com/dirk-thomas/vcstool/blob/2e90383a09e810049ee505e1e618690957c631dc/vcstool/commands/import_.py#L180
But then how does the vcs
client deal with "merging" the a/repo
entry in base.repos
and the a/repo
entry in extension.repos
so that we only have one single entry with version: my-branch
? From what I can tell, clients simply process their singular entry without knowing/caring about other entries (although they're called/executed in the right order according to the dependencies, of course).
Like I said, it might be doable given some work (e.g. by allowing clients to modify the jobs list), or maybe I'm just missing something. But to me it just seems simpler to not use an actual vcs
client and to pre-process to resolve all the extension declarations first and then call the clients to execute a "flat"/simple jobs list.
repositories: base.repos: type: vcs url: base.repos
This is less readable because you have to check whether the base.repos
is a file or a directory in the file system.
When the
extends: base.repos
...
gives you the exact meaning that the base.repos
is an extension file, so is more readable.