grafting-monorepos
grafting-monorepos copied to clipboard
Activity 2: Analysis
Duration: 20 minutes
We will use different analysis tools to identify wrong practices in a repository. To do it we will use the following commands:
- git-sizer
- git-find-dirs-many-files
- git-find-lfs-extensions
- git-find-dirs-unwanted
- git-filter-repo
Before starting any analysis, pick one repository of your preference that you would like to analyze.
:warning: Make sure during all this exercise you don't post any private information that should not be shared publicly.
Clone this repository as we have added all the tools into the it for making the workshop more convenient:
# Clone the repository
git clone https://github.com/githubuniverseworkshops/grafting-monorepos.git
# or use the GitHub CLI
gh repo clone githubuniverseworkshops/grafting-monorepos
Stats of repo size: git-sizer
-
Download the corresponding compiled version of
git-sizer
.
Optionally you can install git-sizer using Homebrew if you are on Mac.
- Run the tool from the root of the repository to analyze:
/path/to/git-sizer --verbose
Find files that should be in LFS: git-find-lfs-extensions
- Checkout the
grafting-monorepos
repository - Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-lfs-extensions
Print directories with the number of files contained: git-find-dirs-many-files
- Checkout the
grafting-monorepos
repository - Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-dirs-many-files
Find dirs that should not be committed: git-find-dirs-unwanted
- Checkout the
grafting-monorepos
repository - Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-dirs-unwanted | head -n 15
Analyze the repository: git-filter-repo --analyze
- Clone the
git-filter-repo
tool - Execute the tool from the linux repository
/path/to/git-filter-repo/git-filter-repo --analyze
Report out
Report your findings from the above commands in comments section below. Be sure to include answers to the following questions in your comments, if possible: - Do you find any patterns? - Was there anything surprising?
:warning: Make sure during all this exercise you don't post any private information that should not be shared publicly.
For examples and more information, please see README.md -> Activity 2.
Here's my git-sizer output:
Processing blobs: 186589
Processing trees: 323092
Processing commits: 51356
Matching commits to trees: 51356
Processing annotated tags: 42
Processing references: 783
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Commits | | |
| * Count | 51.4 k | |
| * Total size | 13.4 MiB | |
| * Trees | | |
| * Count | 323 k | |
| * Total size | 237 MiB | |
| * Total tree entries | 6.14 M | |
| * Blobs | | |
| * Count | 187 k | |
| * Total size | 51.4 GiB | ***** |
| * Annotated tags | | |
| * Count | 42 | |
| * References | | |
| * Count | 783 | |
| | | |
| Biggest objects | | |
| * Commits | | |
| * Maximum size [1] | 2.08 KiB | |
| * Maximum parents [2] | 2 | |
| * Trees | | |
| * Maximum entries [3] | 1.58 k | * |
| * Blobs | | |
| * Maximum size [4] | 198 MiB | ******************** |
| | | |
| History structure | | |
| * Maximum history depth | 11.1 k | |
| * Maximum tag depth [5] | 1 | |
| | | |
| Biggest checkouts | | |
| * Number of directories [6] | 5.08 k | ** |
| * Maximum path depth [6] | 25 | ** |
| * Maximum path length [6] | 280 B | ** |
| * Number of files [7] | 29.2 k | |
| * Total size of files [8] | 8.30 GiB | ******** |
| * Number of symlinks [9] | 175 | |
| * Number of submodules [10] | 20 | |
@toddocon That looks good in general. The 200MB file might be a good candidate for Git LFS.
Here is my git-sizer output:
Processing blobs: 51587
Processing trees: 100112
Processing commits: 15133
Matching commits to trees: 15133
Processing annotated tags: 17
Processing references: 785
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Commits | | |
| * Count | 15.1 k | |
| * Total size | 6.52 MiB | |
| * Trees | | |
| * Count | 100 k | |
| * Total size | 42.2 MiB | |
| * Total tree entries | 1.16 M | |
| * Blobs | | |
| * Count | 51.6 k | |
| * Total size | 1.21 GiB | |
| * Annotated tags | | |
| * Count | 17 | |
| * References | | |
| * Count | 785 | |
| | | |
| Biggest objects | | |
| * Commits | | |
| * Maximum size [1] | 33.5 KiB | |
| * Maximum parents [2] | 2 | |
| * Trees | | |
| * Maximum entries [3] | 87 | |
| * Blobs | | |
| * Maximum size [4] | 20.3 MiB | ** |
| | | |
| History structure | | |
| * Maximum history depth | 4.27 k | |
| * Maximum tag depth [5] | 1 | |
| | | |
| Biggest checkouts | | |
| * Number of directories [6] | 623 | |
| * Maximum path depth [7] | 10 | * |
| * Maximum path length [8] | 111 B | * |
| * Number of files [6] | 2.81 k | |
| * Total size of files [9] | 37.9 MiB | |
| * Number of symlinks | 0 | |
| * Number of submodules | 0 | |
My results 😬
git-sizer
git-sizer --verbose
Processing blobs: 19360
Processing trees: 34523
Processing commits: 7588
Matching commits to trees: 7588
Processing annotated tags: 0
Processing references: 116
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Commits | | |
| * Count | 7.59 k | |
| * Total size | 3.02 MiB | |
| * Trees | | |
| * Count | 34.5 k | |
| * Total size | 16.5 MiB | |
| * Total tree entries | 444 k | |
| * Blobs | | |
| * Count | 19.4 k | |
| * Total size | 1.88 GiB | |
| * Annotated tags | | |
| * Count | 0 | |
| * References | | |
| * Count | 116 | |
| | | |
| Biggest objects | | |
| * Commits | | |
| * Maximum size [1] | 39.5 KiB | |
| * Maximum parents [2] | 2 | |
| * Trees | | |
| * Maximum entries [3] | 193 | |
| * Blobs | | |
| * Maximum size [4] | 521 MiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| | | |
| History structure | | |
| * Maximum history depth | 3.20 k | |
| * Maximum tag depth | 0 | |
| | | |
| Biggest checkouts | | |
| * Number of directories [5] | 218 | |
| * Maximum path depth [6] | 10 | * |
| * Maximum path length [7] | 138 B | * |
| * Number of files [5] | 1.26 k | |
| * Total size of files [8] | 813 MiB | |
| * Number of symlinks [9] | 1 | |
| * Number of submodules | 0 | |
git-find-lfs-extensions
/Users/brentconn/IdeaProjects/grafting-monorepos/scripts/git-find-lfs-extensions
Type Extension LShare LCount Count Size Min Max
------- --------- ------- ------- ------- ------- ------- -------
all * 1.0 % 17 1141 144 0 105
binary bson 16.0 % 2 12 107 0 105
binary gz 10.0 % 4 39 7 0 2
binary png 3.0 % 5 146 18 0 0
text json 3.0 % 4 101 4 0 1
text js 0.0 % 1 553 3 0 1
text svg 12.0 % 1 8 1 0 0
Can't post much else due to sensitive information :)
Extensions:
% ./git-find-lfs-extensions
Type Extension LShare LCount Count Size Min Max
------- --------- ------- ------- ------- ------- ------- -------
all * 4.0 % 534 11707 3351 0 149
binary dn 100.0 % 94 94 1034 2 149
binary dll 72.0 % 108 149 497 0 42
binary lib 43.0 % 16 37 271 0 93
binary so 35.0 % 32 90 164 0 38
binary png 8.0 % 40 493 146 0 21
binary dylib 70.0 % 19 27 131 0 32
binary woff 64.0 % 9 14 122 0 13
binary otf 70.0 % 7 10 116 0 16
binary psd 20.0 % 13 62 91 0 51
binary gif 74.0 % 69 93 86 0 3
binary a 44.0 % 4 9 62 0 31
binary jar 100.0 % 1 1 59 59 59
binary dds 34.0 % 8 23 57 0 16
binary exe 76.0 % 10 13 42 0 14
binary pdb 100.0 % 7 7 33 0 13
binary 8bf 100.0 % 2 2 23 7 15
binary exr 27.0 % 9 33 18 0 5
binary jpg 7.0 % 3 42 15 0 9
binary glb 11.0 % 2 17 11 0 6
binary bin 28.0 % 2 7 6 0 3
binary gz 100.0 % 3 3 4 0 2
binary mod 25.0 % 3 12 4 0 1
binary eps 100.0 % 1 1 2 2 2
binary pdf 13.0 % 2 15 3 0 0
binary texture 50.0 % 1 2 1 0 1
binary csf 40.0 % 2 5 1 0 0
binary tga 100.0 % 1 1 0 0 0
binary blend 100.0 % 1 1 0 0 0
binary w/o ext ZZZZpm 100.0 % 6 6 33 2 9
binary w/o ext ZZZZZ_caps 100.0 % 10 10 32 1 8
binary w/o ext standard multiplugin 100.0 % 2 2 25 11 13
binary w/o ext doxygen 100.0 % 1 1 17 17 17
binary w/o ext amtlib 100.0 % 4 4 8 2 2
binary w/o ext getZZZZZZ 100.0 % 1 1 6 6 6
binary w/o ext updaternotifications 100.0 % 4 4 2 0 0
binary w/o ext bcp 100.0 % 1 1 0 0 0
text obj 27.0 % 3 11 20 0 18
text h 0.0 % 12 3819 57 0 6
text fbx 25.0 % 2 8 9 0 8
text c 0.0 % 1 154 10 0 6
text hpp 0.0 % 2 742 16 0 2
text hdr 100.0 % 3 3 4 0 2
text js 0.0 % 2 340 18 0 1
text html 0.0 % 1 287 5 0 2
text dat 50.0 % 4 8 2 0 0
text cpp 0.0 % 2 1851 30 0 1
text f90 5.0 % 1 19 1 0 0
text fi 6.0 % 1 16 0 0 0
text w/o ext configure 100.0 % 2 2 1 0 0
@alubchuk your repo looks really good. Again the 20MB file might be a good candidate for LFS. In general I recommend to use LFS for files >1MB which are changed regularly.
@onetrickwolf 521 MB files is definitely a candidate for LFS.
@toddocon You have a lot of compiled libraries like dll
, lib
, so
in your repo. If possible it would be good to move that to an artifact store or use Git LFS for them.
@toddocon I don't know what dn
files are... but all of them in your repo are larger than 500kb (the default cut off)... and in total they consume 1GB in your repo. LFS candidate 👍
binary dn 100.0 % 94 94 1034 2 149
What of these results files are of the most interest?
% time git-filter-repo --analyzee
Processed 561079 blob sizes
Processed 51072 commitswarning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 12580 and retry the command.
Processed 51356 commits
Writing reports to .git/filter-repo/analysis...done.
git-filter-repo --analyze 956.38s user 57.45s system 98% cpu 17:05.08 total
README
blob-shas-and-paths.txt
directories-all-sizes.txt
directories-deleted-sizes.txt
extensions-all-sizes.txt
extensions-deleted-sizes.txt
path-all-sizes.txt
path-deleted-sizes.txt
renames.txt
The first 3 lines of each of those could be interesting:
directories-all-sizes.txt
directories-deleted-sizes.txt
extensions-all-sizes.txt
extensions-deleted-sizes.txt
path-all-sizes.txt
path-deleted-sizes.txt
Example from directories-deleted-sizes.txt:
=== Deleted directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
6411041769 3865142944 2019-11-19 apps/path1
2526122419 2121845383 2019-11-05 apps/path2
1721073323 1670074223 2018-08-27 apps/path3
Example from directories-all-sizes.txt:
=== All directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
66362262776 22878379764 <present> <toplevel>
36655294341 6728535969 <present> external
11090835223 5978643918 <present> apps
@toddocon These apps are just present in the history. They are not in the HEAD commit but you carry around the data with every clone. In the next section we will discuss what you can do about it 😉
external
sounds like 3rd party components. That being on the second place means it might make sense to explore a dependency management system. Can you reveal what language is used in this repo?
the final ones:
==> extensions-all-sizes.txt <==
=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
8227796558 7999310896 <present> .png
6482556756 1893380227 <present> .dn
2531400184 1608367250 2019-11-18 .k2
==> extensions-deleted-sizes.txt <==
=== Deleted extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
2531400184 1608367250 2019-11-18 .k2
490733568 220449697 2019-07-11 .raw
111890545 111923458 2016-05-04 .7z
==> path-all-sizes.txt <==
=== All paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name
723820832 259456521 2019-11-05 components/path1
963509828 235721785 <present> external/path1
239188752 223828914 2019-06-03 external/path2
==> path-deleted-sizes.txt <==
=== Deleted paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name(s)
723820832 259456521 2019-11-05 components/path1
239188752 223828914 2019-06-03 external/path1
217239244 203439053 2017-05-20 apps/path1
==> extensions-all-sizes.txt <==
=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
8227796558 7999310896 <present> .png
6482556756 1893380227 <present> .dn
2531400184 1608367250 2019-11-18 .k2
png
files use up most of the space by far. Again using Git LFS might be useful.
==> extensions-deleted-sizes.txt <==
=== Deleted extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
2531400184 1608367250 2019-11-18 .k2
490733568 220449697 2019-07-11 .raw
111890545 111923458 2016-05-04 .7z
k2
files use up significant space although these files are not used anymore.
==> path-all-sizes.txt <==
=== All paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name
723820832 259456521 2019-11-05 components/path1
963509828 235721785 <present> external/path1
239188752 223828914 2019-06-03 external/path2
components/path1
is not used anymore but uses lots of space. Same for external/path2
... plus that might be a 3rd party?
@toddocon ☝️
git-sizer
Processing blobs: 2222 Processing trees: 2519 Processing commits: 515 Matching commits to trees: 515 Processing annotated tags: 0 Processing references: 6
Name | Value | Level of concern |
---|---|---|
Overall repository size | ||
* Commits | ||
* Count | 515 | |
* Total size | 151 KiB | |
* Trees | ||
* Count | 2.52 k | |
* Total size | 824 KiB | |
* Total tree entries | 22.2 k | |
* Blobs | ||
* Count | 2.22 k | |
* Total size | 90.7 MiB | |
* Annotated tags | ||
* Count | 0 | |
* References | ||
* Count | 6 | |
Biggest objects | ||
* Commits | ||
* Maximum size [1] | 827 B | |
* Maximum parents [2] | 2 | |
* Trees | ||
* Maximum entries [3] | 28 | |
* Blobs | ||
* Maximum size [4] | 980 KiB | |
History structure | ||
* Maximum history depth | 378 | |
* Maximum tag depth | 0 | |
Biggest checkouts | ||
* Number of directories [5] | 56 | |
* Maximum path depth [5] | 5 | |
* Maximum path length [5] | 61 B | |
* Number of files [5] | 209 | |
* Total size of files [6] | 2.05 MiB | |
* Number of symlinks | 0 | |
* Number of submodules | 0 |
@thomas-schuster your repo looks perfect!
Processing blobs: 4388 Processing trees: 5043 Processing commits: 2073 Matching commits to trees: 2073 Processing annotated tags: 0 Processing references: 48
Name | Value | Level of concern |
---|---|---|
Overall repository size | ||
* Commits | ||
* Count | 2.07 k | |
* Total size | 572 KiB | |
* Trees | ||
* Count | 5.04 k | |
* Total size | 3.22 MiB | |
* Total tree entries | 82.3 k | |
* Blobs | ||
* Count | 4.39 k | |
* Total size | 228 MiB | |
* Annotated tags | ||
* Count | 0 | |
* References | ||
* Count | 48 | |
Biggest objects | ||
* Commits | ||
* Maximum size [1] | 914 B | |
* Maximum parents [2] | 2 | |
* Trees | ||
* Maximum entries [3] | 44 | |
* Blobs | ||
* Maximum size [4] | 58.4 MiB | ****** |
History structure | ||
* Maximum history depth | 1.85 k | |
* Maximum tag depth | 0 | |
Biggest checkouts | ||
* Number of directories [5] | 82 | |
* Maximum path depth [6] | 8 | |
* Maximum path length [6] | 144 B | * |
* Number of files [7] | 399 | |
* Total size of files [8] | 157 MiB | |
* Number of symlinks | 0 | |
* Number of submodules [9] | 1 |
Please any comments and suggestion about the result from “git-sizer --verbose” for my work repository?
@neilwang0913 In general your repository is in great shape and there is no reason for concern. The single 58 MB file might be a good candidate for Git LFS, but since your overall repository size is rather low that is no big concern.