How to optimise checkout of a monorepo?
For a monorepo of size significantly greater than 1GB, cloning or fetching the repository can be time intensive and degrade the performance of a job.
In a self-hosted runner, one way to remedy the situation is to have a pre-provisioned repository in the runner. That way, improvement can be gained...
Using git clone ...
The are two methods for optimising git clone ...
--reference[-if-able] <repository>.gitwill use the local cache first if available and then reach out to upstream remote for newer objects that are not there.git config --global url."file:///local/repository".insteadOf https://github.com/remote/repository.git. This can be used for cloning and but not for (fetch nor) push.
Question❓
It is currently difficult or impossible to leverage the above measures with actions/checkout. For the following reasons
- Rather than
git clone ...,git initis used. Hence, the--reference[-if-able] <repository>option is not supported byactions/checkout. Is there any reason why thegit initflow was preferred abovegit clone ...❓ Or can we switch togit clone ...and then support the--reference[-if-able] <repository>option? - Would it help if I pointed
actions/checkoutto an existinggitrepository?
On the second question. I suppose it won't help to point to a directory with an existing repository due to the following aspect of the current implementation.
same for my 7gb repo 😢 the only option is to simply use a CLI command steps instead of checkout action, I wonder if there will be support for using existing pre-cached repo in the runner, instead of a fresh new init and clone? 🙏