conan icon indicating copy to clipboard operation
conan copied to clipboard

[feature] Consider enhancing conan.tools.files.get() to support Git repos in addition to compressed file formats

Open System-Arch opened this issue 1 year ago • 1 comments

What is your suggestion?

Conan 2 introduced the ability to locally cache a CCI recipe's source content, which helps reuse CCI recipes outside of CCI, but it is still difficult to leverage these recipes with (a fork of) the underlying Git source repo without modifying / replacing the typical source() method. On the plus side, many recipes now contain boilerplate code that simply calls the conan.tools.files.get function with the appropriate arguments from the conandata.yml file. It would be nice if one could simply replace or augment the entries in the conandata.yml file to specify the URL of a git repo (e.g., https://github.com/madler/zlib.git) and the SHA of the commit at which to checkout the repo.

In pseudo code, this suggestion might look something like:

def get(conanfile, url, md5=None, sha1=None, sha256=None, destination=".", filename="",
        keep_permissions=False, pattern=None, verify=True, retry=None, retry_wait=None,
        auth=None, headers=None, strip_root=False):

if url.endswith(".git"):
    git = Git(self)  # by default, the current folder "."
    git.clone(url=url, target=destination) # git clone url target
    # we need to cd directory for next command "checkout" to work
    git.folder = destination                       # cd target
    git.checkout(commit=sha1)             # git checkout commit
else:
   <existing implementation of "get">

NOTE: There are more efficient ways to implement this functionality as suggested in https://stackoverflow.com/questions/31278902/how-to-shallow-clone-a-specific-commit-with-depth-1

And, of course, a nice accompanying request would be the ability specify (perhaps through an environment variable) an alternate location for the conandata.yml file so as to eliminate the need to even modify that aspect of a recipe.

I am interested to hear your thoughts on this suggestion and to know whether such an idea has been requested by others or considered (I didn't see anything obvious when searching through existing issues). Thanks

Have you read the CONTRIBUTING guide?

  • [X] I've read the CONTRIBUTING guide

System-Arch avatar Aug 28 '24 20:08 System-Arch

one addition: this could also enable the automatic source backup if its a git checkout, which would be cool.

tbsuht avatar Aug 29 '24 07:08 tbsuht

I agree this could be a nice addition, thanks for the suggestion @System-Arch

The only issue is that it seems it will not be able to prioritize it in the short term (marking it as long-term roadmap 2.X), there are many other higher priorities, and this one:

  • is a nice to have, but not a blocker
  • it is possible to implement it, just more verbose and require more custom lines in recipes
  • Most Git servers like Github, Gitlab have http download URLs to download specific commits, tags, releases, and that already supports the backup sources and caching
  • The implementation for this is not straightforward, specially the cache and backup, as this feature uses binary blobs with checksums as storage, which cannot be directly mapped to git clone/checkouts, so it will be necessary to invent some mechanism there.

memsharded avatar Aug 29 '24 09:08 memsharded

I was just looking for a way to backup sources, where I have no control over the recipe source() method, but which uses a Git commit. Some thoughts:

Most Git servers like Github, Gitlab have http download URLs to download specific commits, tags, releases, and that already supports the backup sources and caching

Since some projects might also use Git LFS, this might be an issue, because it depends on the package provider if the LFS objects are included in tar balls: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/managing-repository-settings/managing-git-lfs-objects-in-archives-of-your-repository

The implementation for this is not straightforward, specially the cache and backup, as this feature uses binary blobs with checksums as storage, which cannot be directly mapped to git clone/checkouts, so it will be necessary to invent some mechanism there.

The problem is that in the recipe we need to know the key for the backup sources (from documentation):

In your recipe’s source() method, ensure the relevant get/download calls supply the sha256 signature of the downloaded files.

So in the case of a Git repository we might have in the recipe the commit ID to checkout which we could use as the sha signature (but this requires the full commit ID, not just the short one). But what if it's a tag. In that case without contacting the original Git repository I can't know the commit ID. But this would be required to get it from the backup sources.

Or we could generate a sha signature out of the repository URL + the tag or commit to checkout, then zip the checked out commit and save it under that sha signature in the backup source, so the next time somebody need the same repository URL + tag/commit it should have the same sha signature and could find it in the backup sources.

jokram avatar Nov 14 '25 12:11 jokram