Question: how is the Gitman cache updated?
Hi! I'm trying to better understand how Gitman's cache system works in practice.
As far as I can tell, the cache is created using:
git clone --mirror <repo> <cache_path>
This results in a bare mirror of the original repository stored locally (under .gitman_cache or $GITMAN_CACHE). This is very useful to speed up subsequent clones and to save bandwidth.
However, I couldn't find any documentation or code that clearly describes how and when the cache is refreshed after its first creation.
My questions:
- Is the cache ever updated by Gitman once it's been created?
- Is there any built-in mechanism to update or re-fetch tags/commits from the origin into the cache mirror?
- Or is it up to the user to periodically run something like
git fetch --allinside.gitman_cache/<repo>.reference?
Thanks a lot for this great tool, and for any clarification you can provide!
Max
I'm pretty sure the cache is managed by git itself as a native feature, and updated from git pull.
Hi @jacebrowning
I'm pretty sure the cache is managed by
gititself as a native feature, and updated fromgit pull.
it seems that this is not the case:
https://stackoverflow.com/a/79591897/552247
having said that, and not considering the push, what would you think if an automatic or manual mechanism to update the cache was included in gitman?
Are you suggesting the cache does not actually serve the purpose of speeding up subsequent clones? I have never seen a need to manually update the cache.
hi @jacebrowning
In my tests I confirmed that --reference-if-able does not update the reference repository when running git fetch or git push.
you can try by yourself using this script.
From the Git manual (
git clone):
“--reference[-if-able] <repository>
If the reference<repository>is on the local machine, automatically set up
.git/objects/info/alternatesto obtain objects from the reference
<repository>. Using an already existing repository as an alternate will
require fewer objects to be copied from the repository being cloned, reducing
network and local storage costs.”
https://git-scm.com/docs/git-clone#Documentation/git-clone.txt-code--reference-if-ableltrepositorygtcode
Thank you @jacebrowning for prompting this discussion. As the manual makes clear, using --reference[-if-able] simply shares object storage with the reference repo—it does not register that repo as a target for fetch or push. Therefore, any updates still must be pulled or pushed manually into the mirror.
best regards Max
Sharing object storage isn't enough to speed up future clones?
Hi @jacebrowning
Sharing object storage isn't enough to speed up future clones?
While sharing object storage via --reference[-if-able] does indeed reduce the cost of copying objects at clone time, that benefit quickly vanishes as the mirror grows stale. On an active repository, after a few months, most new objects won’t be in the mirror anymore—so future clones end up pulling almost everything from the origin, losing both bandwidth- and disk-space savings.
To keep a mirror useful, it must be updated regularly. A few possible approaches:
-
Cron-based script
Write a small shell script that loops over each repo in your mirror directory and runsgit remote update, then schedule it via cron. -
gitmansubcommand
Add agitmancommand (e.g.mirror-update) that refreshes all local mirrors on demand. -
Automatic refresh on install/update (recommended)
Hook intogitman’s install or update flow so it automatically runsgit remote updatefor any existing dependency mirrors.
I personally lean toward option 3 — having gitman refresh the mirror whenever you install or upgrade ensures it stays current without any extra maintenance.
best regards Max
Yeah, option 3 sounds reasonable. The user shouldn't need to know about this.
Hi @jacebrowning ok, in the next few days I will try to make a new PR
best regards Max
@jacebrowning
As I'm getting further into understanding the program flow, to figure out where to put the cache update and then make the changes, I'm coming up with quite a few questions about the code. I report them below:
update_files()
I can't make sense of what I'm seeing:
if I entered line 133 I definitely cloned the working dir. Then next (lines 144-147) it checks if the working dir is a valid git repo, and if so it calls the rebuild. The rebuild though doesn't consider the cache or sparse checkout at all. Is this correct and intended?
as soon as I have more doubts I will post them here