subversion-plugin
subversion-plugin copied to clipboard
Check out subversion modules in parallel to speed up checkout
This change lets the plugin check out modules in parallel. We have a project which has a big number of SVN modules and checking out/updating the modules alone took > 2 minutes. Checking out with 4 threads in parallel brought the time down to 30 secs.
We have been running this code on our Jenkins for > 2 years now without any issues. I've recently rebased the changes on top op subversion-plugin 2.5
How does this affect checking out into overlapping directories, e.g. one module to .
, the other to foo
?
Thread count should be configurable (System property to be set on master would suffice IMO), and this code needs to make sure to work correctly with one thread.
Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests
@daniel-beck good point about overlapping directories. I guess the behaviour would be unpredictable in worst case. I'm already checking for exact duplicates and issue a warning. Will try to do the same for overlaps, but I'm pretty sure that could become arbitrary complex if someone is using relative paths.
@daniel-beck what do you mean by: "this code needs to make sure to work correctly with one thread"? Do you see any problems in that area? I think I accounted for that with the CurrentThreadExecutorService
@daniel-beck I've implemented your suggested changes
I expect the most common case to be as I described with one of them being .
, so that case should be handled. I certainly use that in numerous projects due to how projects are laid out in Subversion.
I'd also prefer if that kind of configuration wasn't effectively deprecated.
I'm not sure, I yet understand what you mean by the common case, but I guess it's something like this:
- you check out your main svn module to
.
- you check out an auxiliary module (maybe some environment templates) to a subfolder of this (e.g.
env
) Right?
In that case it should be safe to check out in parallel - given the 'sane case' that the main module doesn't contains an env
folder by this self.
However, we cannot assume that each user has such a 'sane case'.
It the main module would also contain the env folder, the behaviour would be pretty much unpredictable on paralle checkout.
I can see various solutions to this:
- detect overlaps in the local dirs and serialize checkout only for this modules: bad, because adds lot of complexity. Also very probable to not detect 100% of this cases
- lets users decide - new option to checkout in paralle: bad because of new option. Also users probably wouldn't understand full consequences of checking out in parallel
- detect overlaps and check out only with 1 thread for such jobs: probably unnecessary slows down checkout for those jobs and also probably will not detect 100% of cases
- keep it as is and check out all jobs single-threaded
I'd prefer to implement option 3. If there are problems because of not detected corner cases, users could still use the system property to use single-threaded check out.
What do you think?
@daniel-beck do you have any feedback?
@kutzi I have not had the time to try this PR in a staging environment, and am still on 1.565.3 and 2.4.x in production.
I wonder though, shouldn't it be fairly easy to determine which checkouts need to be serialized based on different levels in the folder hierarchy? For example, .
needs to be independent of others while foo
and bar
can be done in parallel, as can foo/bar/baz
and qux
. foo
and foo/bar/baz
though cannot (or should not, conservatively).