substrate icon indicating copy to clipboard operation
substrate copied to clipboard

Reduce Repository Size

Open apopiak opened this issue 3 years ago • 12 comments

Is there an existing issue?

  • [X] I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • [X] This is not a support question.

Description of bug

The Substrate repository has gotten very big, so that other crates/repos depending on it need to download lots of data when building. It would be great to investigate how to reduce the size so clones can be faster.

Steps to reproduce

E.g. clone https://github.com/open-web3-stack/open-runtime-module-library and run cargo test, it will pull in a patched version of substrate that will be downloaded completely.

apopiak avatar May 28 '22 11:05 apopiak

The root problem is actually cargo, more specifically https://github.com/rust-lang/cargo/issues/1171

Because cargo doesn't support shallow clones it downloads the whole repo instead. Shallow clone takes a few seconds, full clone takes ~15 minutes for us in CI. As mentioned in that thread the root issue libgit2 limitation and gitoxide (git implementation in Rust) is not feature rich enough to replace it.

nazar-pc avatar May 28 '22 17:05 nazar-pc

Sure, the root problem is cargo. But as the root problem might take a while to get fixed it would IMO be good to investigate which mitigations can be done on the repo size. E.g. I remember that there was talk that the substrate and polkadot repos got a bunch heavier because of the docs being in branches that are getting quite heavy.

apopiak avatar May 30 '22 09:05 apopiak

See git-sizer run here (the repo is 43GB :warning: ):

Processing blobs: 1004964
Processing trees: 248480
Processing commits: 24336
Matching commits to trees: 24336
Processing annotated tags: 38
Processing references: 1270
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Blobs                      |           |                                |
|   * Total size               |  43.1 GiB | ****                           |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [1] |  5.08 k   | *****                          |
| * Blobs                      |           |                                |
|   * Maximum size         [2] |  60.5 MiB | ******                         |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [3] |  32.7 k   | ****************               |
| * Maximum path depth     [4] |    12     | *                              |
| * Maximum path length    [4] |   189 B   | *                              |
| * Number of files        [5] |   592 k   | ***********                    |
| * Total size of files    [6] |  9.14 GiB | *********                      |

[1]  e20c999e130527d0e60a15629a5997d4dc95cc68 (refs/remotes/origin/gh-pages:crate-docs/libc)
[2]  b53d9a26bc65fb3465cb78639f0f90d939fbda65 (902a8ccf81aa3543f2aa6b455360efdcd9a790a2:substrate/state-machine/core)
[3]  4ac0009a28b5eb2894e8b7f1704e99b559faf230 (edb401e7f56a58ec9c62274e34763e8d7ec54d6a^{tree})
[4]  5ea3249b2c8491566eafc36a25984bc51ea5bc5a (refs/remotes/origin/gh-pages^{tree})
[5]  af51f47d134a1eda17e4ad651528acba659ecd81 (68c6deea68f58da7f63686f28bf1609e68dcfd44^{tree})
[6]  5e00508c9e93c0444107cee65715d3f8a9c86af9 (542dc2f477a11a2c45a396ec6a10bf8a80f2cad3^{tree})

apopiak avatar May 30 '22 09:05 apopiak

CC @TriplEight

bkchr avatar May 30 '22 10:05 bkchr

gh-pages should be moved to another repo or at least made it so that it overwrites a single commit instead of pushing a new one each time. I've tried changing the script to do so a while ago, but could not test it on my machine. Some of the generated doc files paths only differ by case, which makes that branch impossible to work with on an case-insensitive file system, such as APFS.

arkpar avatar May 30 '22 11:05 arkpar

Ack., thanks. I've seen the same problem w Polkadot repo. A simple git clone downloads 1.6 GB at the moment. I'll see what I can do.

TriplEight avatar Jun 03 '22 12:06 TriplEight

We really need a solution to this. I am doing a cargo check and it is downloading repos and I have already finished my coffee and it still downloading.

xlc avatar Jun 06 '22 23:06 xlc

What I ended up doing is just using our fork of Substrate repo with most branches (including gh-pages) removed. This helped a lot with build times.

nazar-pc avatar Jun 07 '22 05:06 nazar-pc

Any news on this?

apopiak avatar Sep 19 '22 16:09 apopiak

@paritytech/ci anyone could look at this?

bkchr avatar Sep 20 '22 20:09 bkchr

Linking https://github.com/github/git-sizer
It has some good suggestions on how to go about reducing repo size.

radupopa2010 avatar Sep 21 '22 11:09 radupopa2010

Recreated gh-pages from scratch as a quickfix. Need to investigate what else can be removed.

alvicsam avatar Sep 21 '22 12:09 alvicsam

It seems that recreating gh-pages reduced size to 8 GB, so now it looks like this:

git-sizer output
Processing blobs: 235799
Processing trees: 462208
Processing commits: 72647
Matching commits to trees: 72647
Processing annotated tags: 46
Processing references: 9164
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  72.6 k   |                                |
|   * Total size               |  42.7 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |   462 k   |                                |
|   * Total size               |   192 MiB |                                |
|   * Total tree entries       |  5.42 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |   236 k   |                                |
|   * Total size               |  8.25 GiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |    46     |                                |
| * References                 |           |                                |
|   * Count                    |  9.16 k   |                                |
|     * Branches               |   553     |                                |
|     * Tags                   |   117     |                                |
|     * Remote-tracking refs   |     3     |                                |
|     * Pull request refs      |  8.49 k   |                                |
|     * Other                  |     1     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |   154 KiB | ***                            |
|   * Maximum parents      [2] |     3     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   307     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  60.5 MiB | ******                         |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  6.87 k   |                                |
| * Maximum tag depth      [5] |     1     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |  4.00 k   | **                             |
| * Maximum path depth     [7] |    12     | *                              |
| * Maximum path length    [8] |   181 B   | *                              |
| * Number of files        [6] |  23.3 k   |                                |
| * Total size of files    [6] |  1.02 GiB | *                              |
| * Number of symlinks     [9] |     5     |                                |
| * Number of submodules  [10] |     1     |                                |

[1]  12b306d0c9b641d99ddf8024940a5687c284ae6d (refs/pull/7044/head)
[2]  342e03514ee6029d501cefacc487decda00af5ea
[3]  319d09c0fb9d7b951ffd5daaf07db93fb5e8beb8 (refs/heads/gh-pages:crate-docs/kitchensink_runtime)
[4]  b53d9a26bc65fb3465cb78639f0f90d939fbda65 (902a8ccf81aa3543f2aa6b455360efdcd9a790a2:substrate/state-machine/core)
[5]  aa730731c075a93eaed64fe3c8057a509c8de6a8 (refs/tags/ci-release-2.0.0-alpha.5+3)
[6]  be86c137eb44f354247d250feca9be16d02a67ef (refs/heads/gh-pages^{tree})
[7]  92c0ffd01f7d2dac7e3328ff7be84d4d765dc18d (08de8b323232821b7df6e830e203ee8102ba3437^{tree})
[8]  615cdf042f9a186776ff8dabd9b24178403d7ef1 (0132128ed55a44210f0431bdc15275b2b06470fb^{tree})
[9]  78787c6427e8a298fd5814a1ae94a08005b450c2 (refs/pull/1002/head^{tree})
[10] f5696c5b02f9d0b1c320b5106e93bfdce2553121 (refs/pull/9847/head:frame)

The way how I calculated it:

mkdir substrate
git clone --mirror https://github.com/paritytech/substrate.git substrate/.git
cd substrate
git config --unset core.bare
git checkout master
for branch in $(git --no-pager branch); do git checkout $branch; git checkout master;done
git-sizer --verbose

Other heavy objects need investigation

alvicsam avatar Sep 23 '22 09:09 alvicsam