vcstool icon indicating copy to clipboard operation
vcstool copied to clipboard

Add 'extends' feature to overlay repos files when importing

Open christophebedard opened this issue 4 years ago • 13 comments

This adds a new extends: root-level key for .repos files, e.g.

# extension.repos
extends: base.repos
repositories:
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version: my-branch

The base.repos file could look like:

# base.repos
repositories:
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version:  master
  another/repo:
    type: git
    url: https://github.com/another/repo.git
    version: 1.2.3

Running vcs import --input extension.repos would checkout a/repo @ my-branch (instead of master) as well as another/repo @ 1.2.3.

This is motivated by ROS 2 development. As a concrete example, the master ros2.repos file tracks a specific ros2_tracing tag. As a developer of that repository, I'd like to maintain a workspace based on that master ros2.repos file (which, for practical reasons, I don't want to modify locally), but using the master branch of ros2_tracing. This is why this solution uses an extension/overlay approach. Otherwise, when a new repo is added to ros2.repos, I update my local ros2.repos file (e.g. with git pull), run vcs import ... to update my workspace, and end up checking out the specific tag instead of staying on master.

Although I haven't tested it, this should allow mixing .repos and .rosinstall files. Note that this means that vcstool extends the rosinstall format (with root-level extends:).

Currently, an extension repos file still has to be valid by itself. Eventually, I'd like to change the import logic to only validate the result of the merge between the extension file and the base file so that the extension file doesn't need to re-define all required attributes (especially type). However, this is for another PR.

christophebedard avatar Jun 13 '20 17:06 christophebedard

Also, this could eventually fit nicely with #89, e.g. the extension .repos file could provide the fork remote on top of the original file's origin remote.

christophebedard avatar Jun 18 '20 13:06 christophebedard

Thank you for working on this. I can see how this feature would be valuable. Before going into the details of the PR I have a few high level thoughts:

  • Atm vcs import only takes a single input file. Arguably it could be updated to allow passing multiple files. I would expect the semantic of that operation be the same as running vcs import multiple times - once for each input file - in the order they are passed. I would think this already satisfies your use case for a custom branch of ros2_tracing.
  • So my main question would be what are the use cases the extends approach enables which aren't possible otherwise?
    • Is avoiding the intermediate state (if you were to follow the first bullet) worth the complexity?
    • Clearly a partial repos file which e.g. only sets a different version but doesn't repeat the type and url of an entry would be something enabled by this.
    • Especially in combination with #156 I could see partial files being useful.
  • Are there cases where specifying more than one file under extends would be useful? Should there a semantic difference between nested chaining vs. enumerating multiple extends files as siblings?

dirk-thomas avatar Jul 06 '20 23:07 dirk-thomas

Thanks @christophebedard and @dirk-thomas for the PR. Let me add my use case here: I maintain the gazebodistro set of files. There are many packages that use as dependency a subset of the other packages available in the same repository (i.e: ign-mathX uses ign-cmakeY, ign-transportZ uses ign-mathX and ign-cmakeY). There is a lot of configurations duplicated to reflect the dependency chain on every of the major versions released. As you can imaging maintaining that amount of duplicate configurations among multiple files is quite a nightmare. If I understand the PR correctly, with this approach I could do:

# ign-transportZ
repositories:
 a/ign-cmakeY:
   type: git
   url: https://github.com/ign/ign-cmake.git
   version: Y
 a/ign-mathX:
   type: git
   url: https://github.com/ign/ign-math.git
   version: X
 a/ign-transportZ:
   type: git
   url: https://github.com/ign/ign-transport.git
   version: Z

And will save me to list again that configurations if I use:

# foo software using only ign-transportZ
extends: ign-transportZ.repos
repositories:
 a/ign-foo:
   type: git
   url: https://github.com/ign/ign-foo.git
   version: lala

j-rivero avatar Jul 07 '20 19:07 j-rivero

  • So my main question would be what are the use cases the extends approach enables which aren't possible otherwise?

First of all, it of course allows the user to only run one command instead of vcs import multiple times. Although no one would actually do it, this PR allows extending .repos files ~infinitely, and running vcs import ~infinitely sounds like a pain :stuck_out_tongue_winking_eye:

My goal was to give the extension a meaning, if that makes sense. If you were to provide a list of .repos files to --input, they're only ever "linked" together when you run the vcs import --input a.repos b.repos ... command. This way, whatever name you give the extension .repos file will always be valid, since its link to the extended .repos file is clearly defined with extends: base.repos.

Also, in my use case above, and I guess in a lot of similar use cases, the relationship between the extension file and the extended file isn't going to change, so why not shorten the command?

And this also allows you to still easily provide the .repos file (in this case the extension file) through stdin, just like before.

  • Is avoiding the intermediate state (if you were to follow the first bullet) worth the complexity?

I think so.

  • Clearly a partial repos file which e.g. only sets a different version but doesn't repeat the type and url of an entry would be something enabled by this.
  • Especially in combination with #156 I could see partial files being useful.

Absolutely, with some changes to the parsing/validation logic.

  • Are there cases where specifying more than one file under extends would be useful? Should there a semantic difference between nested chaining vs. enumerating multiple extends files as siblings?

I can see that being useful, yes, e.g. if you only need to aggregate (and maybe add more repos on top) instead of needing to override versions. So the difference is aggregation (of repos) vs overriding (of versions), even if they are combined most of the time.

christophebedard avatar Jul 07 '20 19:07 christophebedard

If I understand the PR correctly, with this approach I could do:

Yep!

And, to reference my reply above, if the extension file has a meaning by itself, since you're only combining repos here and not overriding versions, you could instead use one .repos file which aggregates your two .repos files (if the enumeration/aggregation thing gets added).

christophebedard avatar Jul 07 '20 19:07 christophebedard

So the difference is aggregation (of repos) vs overriding (of versions), even if they are combined most of the time.

I would assume the following semantic:

  • for aggregation / sibling files we want to make sure that the entries don't collide (either error or warn about it if they do) and
  • for nested files / potential overriding we want to explicitly allow "updating" entries from leafy files by entries closer to the root input file.

(I know this isn't covered by the PR atm but I am trying to clarify where we want to go.)

dirk-thomas avatar Jul 07 '20 21:07 dirk-thomas

  • for aggregation / sibling files we want to make sure that the entries don't collide (either error or warn about it if they do) and

Indeed. In this case, the file listed first (top to bottom) could have priority.

  • for nested files / potential overriding we want to explicitly allow "updating" entries from leafy files by entries closer to the root input file.

Exactly, with the order being from leaf (extended file) to root (extension file), like in the tests I've written.

Also, this starts getting more and more complex, but should "deletions" be allowed, i.e. removing a repo that is in the leaf/extended file using a special key/value in the extension file?

(I know this isn't covered by the PR atm but I am trying to clarify where we want to go.)

I should've opened an issue before the PR :stuck_out_tongue:

christophebedard avatar Jul 08 '20 12:07 christophebedard

@dirk-thomas friendly ping. What's the next step?

christophebedard avatar Sep 25 '20 19:09 christophebedard

Here is a thought on the approach for aggregation. Would it be easier to add a vcs client for type: vcs?

dgpetrie avatar Feb 02 '21 00:02 dgpetrie

Here is a thought on the approach for aggregation. Would it be easier to add a vcs client for type: vcs?

While probably not impossible, I'm not sure that this fits vcstool's architecture. Besides, I'm not sure what meaning a type: vcs entry's path would have.

christophebedard avatar Feb 02 '21 20:02 christophebedard

Sorry if I was not clear. I was suggesting a way to do recursive aggregation where a VCS input file can include children VCS input files. So a repo dependent upon a base repo or set of repos can include the dependency VCS input yaml files(). To take @j-rivero's example above:

# foo software using only ign-transportZ
repositories:
  ign-transportZ.repos:
    type: vcs
    url: ign-transportZ.repos
 a/ign-foo:
   type: git
   url: https://github.com/ign/ign-foo.git
   version: lala

dgpetrie avatar Feb 03 '21 20:02 dgpetrie

Sorry if I was not clear. I was suggesting a way to do recursive aggregation where a VCS input file can include children VCS input files. So a repo dependent upon a base repo or set of repos can include the dependency VCS input yaml files().

Sorry, I'm a bit confused, because that's kind of what this PR does. I'm just not sure if you're only talking about how we declare these dependencies in a yaml format in a .repos file.

We could just pre-process like I did for this PR and replace an entry of type: vcs under repositories: by the repositories contained in that file and do it recursively, etc. However, repeating the path and using type: vcs seems clunky, e.g.:

extends: ign-transportZ.repos
repositories:
  ...

vs

repositories:
  ign-transportZ.repos:
    type: vcs
    url: ign-transportZ.repos
  ...

However, if you're talking about using an actual vcs client (like the ones here: https://github.com/dirk-thomas/vcstool/tree/2e90383a09e810049ee505e1e618690957c631dc/vcstool/clients), I don't think that would be ideal, from what I can tell. Here's an example:

# base.repos
repositories:
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version:  master
  another/repo:
    type: git
    url: https://github.com/another/repo.git
    version: 1.2.3
# extension.repos
repositories:
  base.repos:
    type: vcs
    url: base.repos
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version: my-branch

which is just the example I added to the README, but using type: vcs to declare the extension.

Here, given extension.repos as your entrypoint file, you have to make sure to process base.repos before a/repo, because the extension file modifies the original a/repo's version: (from master to my-branch). Dependencies are supported, e.g. here for nested repositories: https://github.com/dirk-thomas/vcstool/blob/2e90383a09e810049ee505e1e618690957c631dc/vcstool/commands/import_.py#L180

But then how does the vcs client deal with "merging" the a/repo entry in base.repos and the a/repo entry in extension.repos so that we only have one single entry with version: my-branch? From what I can tell, clients simply process their singular entry without knowing/caring about other entries (although they're called/executed in the right order according to the dependencies, of course).

Like I said, it might be doable given some work (e.g. by allowing clients to modify the jobs list), or maybe I'm just missing something. But to me it just seems simpler to not use an actual vcs client and to pre-process to resolve all the extension declarations first and then call the clients to execute a "flat"/simple jobs list.

christophebedard avatar Feb 04 '21 21:02 christophebedard

repositories:
  base.repos:
    type: vcs
    url: base.repos

This is less readable because you have to check whether the base.repos is a file or a directory in the file system.

When the

extends: base.repos
...

gives you the exact meaning that the base.repos is an extension file, so is more readable.

andry81 avatar Aug 28 '23 12:08 andry81