grafting-monorepos icon indicating copy to clipboard operation
grafting-monorepos copied to clipboard

Activity 2: Analysis

Open selkins13 opened this issue 4 years ago • 20 comments

Duration: 20 minutes

We will use different analysis tools to identify wrong practices in a repository. To do it we will use the following commands:

  • git-sizer
  • git-find-dirs-many-files
  • git-find-lfs-extensions
  • git-find-dirs-unwanted
  • git-filter-repo

Before starting any analysis, pick one repository of your preference that you would like to analyze.

:warning: Make sure during all this exercise you don't post any private information that should not be shared publicly.

Clone this repository as we have added all the tools into the it for making the workshop more convenient:

# Clone the repository
git clone https://github.com/githubuniverseworkshops/grafting-monorepos.git


#  or use the GitHub CLI
gh repo clone githubuniverseworkshops/grafting-monorepos

Stats of repo size: git-sizer

  1. Download the corresponding compiled version of git-sizer.

Optionally you can install git-sizer using Homebrew if you are on Mac.

  1. Run the tool from the root of the repository to analyze:
/path/to/git-sizer --verbose

Find files that should be in LFS: git-find-lfs-extensions

  1. Checkout the grafting-monorepos repository
  2. Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-lfs-extensions

Print directories with the number of files contained: git-find-dirs-many-files

  1. Checkout the grafting-monorepos repository
  2. Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-dirs-many-files

Find dirs that should not be committed: git-find-dirs-unwanted

  1. Checkout the grafting-monorepos repository
  2. Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-dirs-unwanted | head -n 15            

Analyze the repository: git-filter-repo --analyze

  1. Clone the git-filter-repo tool
  2. Execute the tool from the linux repository
/path/to/git-filter-repo/git-filter-repo --analyze

Report out

Report your findings from the above commands in comments section below. Be sure to include answers to the following questions in your comments, if possible: - Do you find any patterns? - Was there anything surprising?

:warning: Make sure during all this exercise you don't post any private information that should not be shared publicly.

For examples and more information, please see README.md -> Activity 2.

selkins13 avatar Nov 20 '20 20:11 selkins13

Here's my git-sizer output:

Processing blobs: 186589                        
Processing trees: 323092                        
Processing commits: 51356                        
Matching commits to trees: 51356                        
Processing annotated tags: 42                        
Processing references: 783                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  51.4 k   |                                |
|   * Total size               |  13.4 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |   323 k   |                                |
|   * Total size               |   237 MiB |                                |
|   * Total tree entries       |  6.14 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |   187 k   |                                |
|   * Total size               |  51.4 GiB | *****                          |
| * Annotated tags             |           |                                |
|   * Count                    |    42     |                                |
| * References                 |           |                                |
|   * Count                    |   783     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  2.08 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |  1.58 k   | *                              |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |   198 MiB | ********************           |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  11.1 k   |                                |
| * Maximum tag depth      [5] |     1     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |  5.08 k   | **                             |
| * Maximum path depth     [6] |    25     | **                             |
| * Maximum path length    [6] |   280 B   | **                             |
| * Number of files        [7] |  29.2 k   |                                |
| * Total size of files    [8] |  8.30 GiB | ********                       |
| * Number of symlinks     [9] |   175     |                                |
| * Number of submodules  [10] |    20     |                                |

toddocon avatar Dec 11 '20 19:12 toddocon

@toddocon That looks good in general. The 200MB file might be a good candidate for Git LFS.

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

Here is my git-sizer output:

Processing blobs: 51587
Processing trees: 100112
Processing commits: 15133
Matching commits to trees: 15133
Processing annotated tags: 17
Processing references: 785
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  15.1 k   |                                |
|   * Total size               |  6.52 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |   100 k   |                                |
|   * Total size               |  42.2 MiB |                                |
|   * Total tree entries       |  1.16 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  51.6 k   |                                |
|   * Total size               |  1.21 GiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |    17     |                                |
| * References                 |           |                                |
|   * Count                    |   785     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  33.5 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |    87     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  20.3 MiB | **                             |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  4.27 k   |                                |
| * Maximum tag depth      [5] |     1     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |   623     |                                |
| * Maximum path depth     [7] |    10     | *                              |
| * Maximum path length    [8] |   111 B   | *                              |
| * Number of files        [6] |  2.81 k   |                                |
| * Total size of files    [9] |  37.9 MiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

alubchuk avatar Dec 11 '20 19:12 alubchuk

My results 😬

git-sizer

git-sizer --verbose
Processing blobs: 19360                        
Processing trees: 34523                        
Processing commits: 7588                        
Matching commits to trees: 7588                        
Processing annotated tags: 0                        
Processing references: 116                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  7.59 k   |                                |
|   * Total size               |  3.02 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |  34.5 k   |                                |
|   * Total size               |  16.5 MiB |                                |
|   * Total tree entries       |   444 k   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  19.4 k   |                                |
|   * Total size               |  1.88 GiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |   116     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  39.5 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   193     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |   521 MiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  3.20 k   |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [5] |   218     |                                |
| * Maximum path depth     [6] |    10     | *                              |
| * Maximum path length    [7] |   138 B   | *                              |
| * Number of files        [5] |  1.26 k   |                                |
| * Total size of files    [8] |   813 MiB |                                |
| * Number of symlinks     [9] |     1     |                                |
| * Number of submodules       |     0     |                                |

git-find-lfs-extensions

/Users/brentconn/IdeaProjects/grafting-monorepos/scripts/git-find-lfs-extensions

Type           Extension              LShare    LCount     Count      Size       Min       Max
-------        ---------             -------   -------   -------   -------   -------   -------
all            *                       1.0 %        17      1141       144         0       105
binary         bson                   16.0 %         2        12       107         0       105
binary         gz                     10.0 %         4        39         7         0         2
binary         png                     3.0 %         5       146        18         0         0
text           json                    3.0 %         4       101         4         0         1
text           js                      0.0 %         1       553         3         0         1
text           svg                    12.0 %         1         8         1         0         0

Can't post much else due to sensitive information :)

onetrickwolf avatar Dec 11 '20 19:12 onetrickwolf

Extensions:

% ./git-find-lfs-extensions 

Type             Extension                      LShare    LCount     Count      Size       Min       Max
-------          ---------                     -------   -------   -------   -------   -------   -------
all              *                               4.0 %       534     11707      3351         0       149
binary           dn                            100.0 %        94        94      1034         2       149
binary           dll                            72.0 %       108       149       497         0        42
binary           lib                            43.0 %        16        37       271         0        93
binary           so                             35.0 %        32        90       164         0        38
binary           png                             8.0 %        40       493       146         0        21
binary           dylib                          70.0 %        19        27       131         0        32
binary           woff                           64.0 %         9        14       122         0        13
binary           otf                            70.0 %         7        10       116         0        16
binary           psd                            20.0 %        13        62        91         0        51
binary           gif                            74.0 %        69        93        86         0         3
binary           a                              44.0 %         4         9        62         0        31
binary           jar                           100.0 %         1         1        59        59        59
binary           dds                            34.0 %         8        23        57         0        16
binary           exe                            76.0 %        10        13        42         0        14
binary           pdb                           100.0 %         7         7        33         0        13
binary           8bf                           100.0 %         2         2        23         7        15
binary           exr                            27.0 %         9        33        18         0         5
binary           jpg                             7.0 %         3        42        15         0         9
binary           glb                            11.0 %         2        17        11         0         6
binary           bin                            28.0 %         2         7         6         0         3
binary           gz                            100.0 %         3         3         4         0         2
binary           mod                            25.0 %         3        12         4         0         1
binary           eps                           100.0 %         1         1         2         2         2
binary           pdf                            13.0 %         2        15         3         0         0
binary           texture                        50.0 %         1         2         1         0         1
binary           csf                            40.0 %         2         5         1         0         0
binary           tga                           100.0 %         1         1         0         0         0
binary           blend                         100.0 %         1         1         0         0         0
binary w/o ext   ZZZZpm                        100.0 %         6         6        33         2         9
binary w/o ext   ZZZZZ_caps                    100.0 %        10        10        32         1         8
binary w/o ext   standard multiplugin          100.0 %         2         2        25        11        13
binary w/o ext   doxygen                       100.0 %         1         1        17        17        17
binary w/o ext   amtlib                        100.0 %         4         4         8         2         2
binary w/o ext   getZZZZZZ                     100.0 %         1         1         6         6         6
binary w/o ext   updaternotifications          100.0 %         4         4         2         0         0
binary w/o ext   bcp                           100.0 %         1         1         0         0         0
text             obj                            27.0 %         3        11        20         0        18
text             h                               0.0 %        12      3819        57         0         6
text             fbx                            25.0 %         2         8         9         0         8
text             c                               0.0 %         1       154        10         0         6
text             hpp                             0.0 %         2       742        16         0         2
text             hdr                           100.0 %         3         3         4         0         2
text             js                              0.0 %         2       340        18         0         1
text             html                            0.0 %         1       287         5         0         2
text             dat                            50.0 %         4         8         2         0         0
text             cpp                             0.0 %         2      1851        30         0         1
text             f90                             5.0 %         1        19         1         0         0
text             fi                              6.0 %         1        16         0         0         0
text w/o ext     configure                     100.0 %         2         2         1         0         0

toddocon avatar Dec 11 '20 19:12 toddocon

@alubchuk your repo looks really good. Again the 20MB file might be a good candidate for LFS. In general I recommend to use LFS for files >1MB which are changed regularly.

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

@onetrickwolf 521 MB files is definitely a candidate for LFS.

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

@toddocon You have a lot of compiled libraries like dll, lib, so in your repo. If possible it would be good to move that to an artifact store or use Git LFS for them.

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

@toddocon I don't know what dn files are... but all of them in your repo are larger than 500kb (the default cut off)... and in total they consume 1GB in your repo. LFS candidate 👍

binary           dn                            100.0 %        94        94      1034         2       149

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

What of these results files are of the most interest?

% time git-filter-repo --analyzee

Processed 561079 blob sizes
Processed 51072 commitswarning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 12580 and retry the command.
Processed 51356 commits
Writing reports to .git/filter-repo/analysis...done.
git-filter-repo --analyze  956.38s user 57.45s system 98% cpu 17:05.08 total


README
blob-shas-and-paths.txt
directories-all-sizes.txt
directories-deleted-sizes.txt
extensions-all-sizes.txt
extensions-deleted-sizes.txt
path-all-sizes.txt
path-deleted-sizes.txt
renames.txt

toddocon avatar Dec 11 '20 19:12 toddocon

The first 3 lines of each of those could be interesting:

directories-all-sizes.txt
directories-deleted-sizes.txt
extensions-all-sizes.txt
extensions-deleted-sizes.txt
path-all-sizes.txt
path-deleted-sizes.txt

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

Example from directories-deleted-sizes.txt:

=== Deleted directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
  6411041769 3865142944 2019-11-19 apps/path1
  2526122419 2121845383 2019-11-05 apps/path2
  1721073323 1670074223 2018-08-27 apps/path3

Example from directories-all-sizes.txt:

=== All directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
  66362262776 22878379764 <present>  <toplevel>
  36655294341 6728535969 <present>  external
  11090835223 5978643918 <present>  apps

toddocon avatar Dec 11 '20 19:12 toddocon

@toddocon These apps are just present in the history. They are not in the HEAD commit but you carry around the data with every clone. In the next section we will discuss what you can do about it 😉


external sounds like 3rd party components. That being on the second place means it might make sense to explore a dependency management system. Can you reveal what language is used in this repo?

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

the final ones:

==> extensions-all-sizes.txt <==
=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  8227796558 7999310896 <present>  .png
  6482556756 1893380227 <present>  .dn
  2531400184 1608367250 2019-11-18 .k2

==> extensions-deleted-sizes.txt <==
=== Deleted extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  2531400184 1608367250 2019-11-18 .k2
   490733568  220449697 2019-07-11 .raw
   111890545  111923458 2016-05-04 .7z

==> path-all-sizes.txt <==
=== All paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name
   723820832  259456521 2019-11-05 components/path1
   963509828  235721785 <present>  external/path1
   239188752  223828914 2019-06-03 external/path2

==> path-deleted-sizes.txt <==
=== Deleted paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name(s)
   723820832  259456521 2019-11-05 components/path1
   239188752  223828914 2019-06-03 external/path1
   217239244  203439053 2017-05-20 apps/path1

toddocon avatar Dec 11 '20 19:12 toddocon

==> extensions-all-sizes.txt <==
=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  8227796558 7999310896 <present>  .png
  6482556756 1893380227 <present>  .dn
  2531400184 1608367250 2019-11-18 .k2

png files use up most of the space by far. Again using Git LFS might be useful.

==> extensions-deleted-sizes.txt <==
=== Deleted extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  2531400184 1608367250 2019-11-18 .k2
   490733568  220449697 2019-07-11 .raw
   111890545  111923458 2016-05-04 .7z

k2 files use up significant space although these files are not used anymore.

==> path-all-sizes.txt <==
=== All paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name
   723820832  259456521 2019-11-05 components/path1
   963509828  235721785 <present>  external/path1
   239188752  223828914 2019-06-03 external/path2

components/path1 is not used anymore but uses lots of space. Same for external/path2 ... plus that might be a 3rd party?

@toddocon ☝️

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

git-sizer

Processing blobs: 2222 Processing trees: 2519 Processing commits: 515 Matching commits to trees: 515 Processing annotated tags: 0 Processing references: 6

Name Value Level of concern
Overall repository size
* Commits
* Count 515
* Total size 151 KiB
* Trees
* Count 2.52 k
* Total size 824 KiB
* Total tree entries 22.2 k
* Blobs
* Count 2.22 k
* Total size 90.7 MiB
* Annotated tags
* Count 0
* References
* Count 6
Biggest objects
* Commits
* Maximum size [1] 827 B
* Maximum parents [2] 2
* Trees
* Maximum entries [3] 28
* Blobs
* Maximum size [4] 980 KiB
History structure
* Maximum history depth 378
* Maximum tag depth 0
Biggest checkouts
* Number of directories [5] 56
* Maximum path depth [5] 5
* Maximum path length [5] 61 B
* Number of files [5] 209
* Total size of files [6] 2.05 MiB
* Number of symlinks 0
* Number of submodules 0

thomas-schuster avatar Dec 11 '20 19:12 thomas-schuster

@thomas-schuster your repo looks perfect!

larsxschneider avatar Dec 11 '20 19:12 larsxschneider

Processing blobs: 4388 Processing trees: 5043 Processing commits: 2073 Matching commits to trees: 2073 Processing annotated tags: 0 Processing references: 48

Name Value Level of concern
Overall repository size
* Commits
* Count 2.07 k
* Total size 572 KiB
* Trees
* Count 5.04 k
* Total size 3.22 MiB
* Total tree entries 82.3 k
* Blobs
* Count 4.39 k
* Total size 228 MiB
* Annotated tags
* Count 0
* References
* Count 48
Biggest objects
* Commits
* Maximum size [1] 914 B
* Maximum parents [2] 2
* Trees
* Maximum entries [3] 44
* Blobs
* Maximum size [4] 58.4 MiB ******
History structure
* Maximum history depth 1.85 k
* Maximum tag depth 0
Biggest checkouts
* Number of directories [5] 82
* Maximum path depth [6] 8
* Maximum path length [6] 144 B *
* Number of files [7] 399
* Total size of files [8] 157 MiB
* Number of symlinks 0
* Number of submodules [9] 1

neilwang0913 avatar Dec 11 '20 21:12 neilwang0913

Please any comments and suggestion about the result from “git-sizer --verbose” for my work repository?

neilwang0913 avatar Dec 11 '20 21:12 neilwang0913

@neilwang0913 In general your repository is in great shape and there is no reason for concern. The single 58 MB file might be a good candidate for Git LFS, but since your overall repository size is rather low that is no big concern.

larsxschneider avatar Dec 13 '20 19:12 larsxschneider