gitman icon indicating copy to clipboard operation
gitman copied to clipboard

Question: how is the Gitman cache updated?

Open mastupristi opened this issue 8 months ago • 9 comments

Hi! I'm trying to better understand how Gitman's cache system works in practice.

As far as I can tell, the cache is created using:

git clone --mirror <repo> <cache_path>

This results in a bare mirror of the original repository stored locally (under .gitman_cache or $GITMAN_CACHE). This is very useful to speed up subsequent clones and to save bandwidth.

However, I couldn't find any documentation or code that clearly describes how and when the cache is refreshed after its first creation.

My questions:

  • Is the cache ever updated by Gitman once it's been created?
  • Is there any built-in mechanism to update or re-fetch tags/commits from the origin into the cache mirror?
  • Or is it up to the user to periodically run something like git fetch --all inside .gitman_cache/<repo>.reference?

Thanks a lot for this great tool, and for any clarification you can provide!

Max

mastupristi avatar Apr 24 '25 14:04 mastupristi

I'm pretty sure the cache is managed by git itself as a native feature, and updated from git pull.

jacebrowning avatar Apr 24 '25 17:04 jacebrowning

Hi @jacebrowning

I'm pretty sure the cache is managed by git itself as a native feature, and updated from git pull.

it seems that this is not the case:

https://stackoverflow.com/a/79591897/552247

having said that, and not considering the push, what would you think if an automatic or manual mechanism to update the cache was included in gitman?

mastupristi avatar Apr 25 '25 08:04 mastupristi

Are you suggesting the cache does not actually serve the purpose of speeding up subsequent clones? I have never seen a need to manually update the cache.

jacebrowning avatar Apr 25 '25 13:04 jacebrowning

hi @jacebrowning

In my tests I confirmed that --reference-if-able does not update the reference repository when running git fetch or git push.

you can try by yourself using this script.

From the Git manual (git clone):
--reference[-if-able] <repository>
If the reference <repository> is on the local machine, automatically set up
.git/objects/info/alternates to obtain objects from the reference
<repository>. Using an already existing repository as an alternate will
require fewer objects to be copied from the repository being cloned, reducing
network and local storage costs.”
https://git-scm.com/docs/git-clone#Documentation/git-clone.txt-code--reference-if-ableltrepositorygtcode

Thank you @jacebrowning for prompting this discussion. As the manual makes clear, using --reference[-if-able] simply shares object storage with the reference repo—it does not register that repo as a target for fetch or push. Therefore, any updates still must be pulled or pushed manually into the mirror.

best regards Max

mastupristi avatar Apr 25 '25 15:04 mastupristi

Sharing object storage isn't enough to speed up future clones?

jacebrowning avatar Apr 25 '25 17:04 jacebrowning

Hi @jacebrowning

Sharing object storage isn't enough to speed up future clones?

While sharing object storage via --reference[-if-able] does indeed reduce the cost of copying objects at clone time, that benefit quickly vanishes as the mirror grows stale. On an active repository, after a few months, most new objects won’t be in the mirror anymore—so future clones end up pulling almost everything from the origin, losing both bandwidth- and disk-space savings.

To keep a mirror useful, it must be updated regularly. A few possible approaches:

  1. Cron-based script
    Write a small shell script that loops over each repo in your mirror directory and runs git remote update, then schedule it via cron.
  2. gitman subcommand
    Add a gitman command (e.g. mirror-update) that refreshes all local mirrors on demand.
  3. Automatic refresh on install/update (recommended)
    Hook into gitman’s install or update flow so it automatically runs git remote update for any existing dependency mirrors.

I personally lean toward option 3 — having gitman refresh the mirror whenever you install or upgrade ensures it stays current without any extra maintenance.

best regards Max

mastupristi avatar Apr 25 '25 19:04 mastupristi

Yeah, option 3 sounds reasonable. The user shouldn't need to know about this.

jacebrowning avatar Apr 25 '25 19:04 jacebrowning

Hi @jacebrowning ok, in the next few days I will try to make a new PR

best regards Max

mastupristi avatar Apr 25 '25 20:04 mastupristi

@jacebrowning

As I'm getting further into understanding the program flow, to figure out where to put the cache update and then make the changes, I'm coming up with quite a few questions about the code. I report them below:

update_files()

I can't make sense of what I'm seeing: Image

if I entered line 133 I definitely cloned the working dir. Then next (lines 144-147) it checks if the working dir is a valid git repo, and if so it calls the rebuild. The rebuild though doesn't consider the cache or sparse checkout at all. Is this correct and intended?

as soon as I have more doubts I will post them here

mastupristi avatar Apr 28 '25 08:04 mastupristi