git-client-plugin
git-client-plugin copied to clipboard
JENKINS 13493- Automatic Git Cache Maintenance
JENKINS-13493 -
This PR adds git maintenance to GitClient Plugin to run maintenance tasks.
Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.
- [x] I have read the CONTRIBUTING doc
- [x] I have referenced the Jira issue related to my changes in one or more commit messages
- [x] I have added tests that verify my changes
- [x] Unit tests pass locally with my changes
- [ ] I have added documentation as necessary
- [ ] No Javadoc warnings were introduced with my changes
- [ ] No spotbugs warnings were introduced with my changes
- [ ] I have interactively tested my changes
Types of changes
- [ ] Infrastructure change (non-breaking change which updates dependencies or improves infrastructure)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Just as a bit of concern: if there are some long-lived workspaces on the same host so Git would use references to "another index", I believe GC can invalidate those workspaces. At least, it is a documented no-no for reference repositories.
Are expected usecases here susceptible to that?
Just as a bit of concern: if there are some long-lived workspaces on the same host so Git would use references to "another index", I believe GC can invalidate those workspaces. At least, it is a documented no-no for reference repositories.
Are expected use cases here susceptible to that?
The caches that are being managed are on the Jenkins controller. The Jenkins project recommends against running jobs on the Jenkins controller. I don't think we should sacrifice performance for the general use cases of caches on the controller in order to avoid risk to a jobs that are run on the controller (not recommended) and have somehow obtained the directory name of the cache so that they can use it as a reference repository (no published way for user jobs to know the directory name of the cache on the controller).
In that case I worry about things Jenkins checks out internally (e.g. shared libraries). Is every checkout unique and not used as a remote or reference for others? All cool then :)
Just in case, note also that my PR #644 on reference repo fanout does also get used by checkouts of shared libraries, if so configured (and if reference repos are available on the controller), same way as the current vanilla plugin codebase can use one directory for such reference repos to speed up JSL checkouts. During development of the helper scripts to manage/update that repo, I was bit by garbage collection so they default to disabling GC now.
I am not sure if any such Git workspaces for JSLs persist, but I would suppose that in general they do (there are options to wipe before clone, etc. - but these are not general-case defaults). People should know to take care to not "gc" the refrepo locations, and the plugin should not do that either. For leaf repos it should be fine. Just saying :)
Needs tests that call the new methods and check that they are functional.
The api may change as I am planning to add the git ls-remote command to check if a cache is private or not.
This could be added in the git-plugin as well.
@Hrushi20 I can duplicate the maintenance test failure in GitClientTest on Java 17.0.4.1 on my Red Hat Enterprise Linux 8.6 machine and on my Ubuntu 22.04 machine. The test passes on Java 11.0.16.1 on both those machines.
@MarkEWaite, do I skip prefetch for ssh protocols? I tried using the git ls-remote command and CliGit waits for infinitely for me to enter my ssh password. For http requests, the git ls-remote fails and we skip to the next cache.
Also, @MarkEWaite, I removed the git maintenance legacy method.
@MarkEWaite, do I skip prefetch for ssh protocols? I tried using the
git ls-remotecommand and CliGit waits for infinitely for me to enter my ssh password. For http requests, thegit ls-remotefails and we skip to the next cache.
Yes, skip prefetch for ssh protocol repositories. The call to git ls-remote should have failed immediately instead of waiting for a password. That likely indicates that it may be time to make the setsid behavior the default.
I assume you were running Jenkins from a command line with either java -war jenkins.war or with mvn hpi:run. If it was one of those two runtime modes, then that means there was a controlling terminal available to the Jenkins process. Command line git calls command line ssh and command line ssh checks for a controlling terminal. If it finds one, then it prompts for the password.
If you were running Jenkins as a service, there would be no controlling terminal, so the ssh command would have exited immediately.
There is a setting that can force it to exit immediately (something about setsid), but that is not the default. I was too scared to change that default lest someone were critically dependent on the current behavior.
So I'll call git ls-remote. Based on your above statement, it will fail immediately for ssh protocol. Once failed, I can skip the prefetch on that task. I'm not implementing a specific way to check for ssh protocol as it is going to fail on its own.
So I'll call
git ls-remote. Based on your above statement, it will fail immediately for ssh protocol. Once failed, I can skip the prefetch on that task. I'm not implementing a specific way to check for ssh protocol as it is going to fail on its own.
That's fine as well. The cases where it will not fail immediately are cases where Jenkins is running in the foreground and has not been started with the setsid property enabled with
$ java -Dorg.jenkinsci.plugins.gitclient.CliGitAPIImpl.useSETSID=true jenkins.war