Wildbook icon indicating copy to clipboard operation
Wildbook copied to clipboard

trim git history in non-interruptive way

Open TanyaStere42 opened this issue 1 year ago • 9 comments

Feature description and context

Git history has one massive commit that causes timeout for some people with slower internet. Getting rid of that would improve everyone's dev experience and make the work more accessible in remote areas

Investigation notes https://stackoverflow.com/questions/4515580/how-do-i-remove-the-old-history-from-a-git-repository

Feature sign-off requirements

  • Git history no longer has massive commit that causes significant download lag
  • Current developers do not need to do anything to address the change

TanyaStere42 avatar Nov 13 '24 18:11 TanyaStere42

I can pick this up!

Rodhlann avatar Dec 02 '24 23:12 Rodhlann

awesome, very excited :partying_face:

TanyaStere42 avatar Dec 02 '24 23:12 TanyaStere42

The files listed below are the largest files in our git history, most of them no longer exist on main and can probably be removed from the git history, but sutime-stanford-corenlp-models-3.6.0.jar is quite large, and I'm not sure if it can be removed.

If we were to remove these files everyone who works on this repo would likely have to re-clone the repo to restore their local git history. That may go against the "Current developers do not need to do anything to address the change" acceptance criteria.

Hash Size File Path Exists
9f4704c9c549 21MiB tessdata/eng.traineddata
64ae74e1e0a0 21MiB src/main/resources/tessdata/eng.traineddata
bc24e2720e37 21MiB frontend/public/forest.png
afb8ef60dc56 26MiB src/main/webapp/video/reportingencounters/ReportingAnEncounter.swf
8feb73fa5a1d 27MiB postgis/simplified_water_polygons.sql.gz
914341124af1 30MiB src/main/webapp/video/usingpaintnet/Using Paint.NET.swf
08d4ec4e2e9a 44MiB src/main/webapp/video/scanningformatches/ScanningForAMatch.swf
7aa76c4ed2a8 44MiB src/main/webapp/video/RolexBradNorman.wmv
1574db1090f6 68MiB src/main/webapp/video/approvingencounters/ApprovingAnEncounter.swf
52852e1955f3 72MiB local-repo/sutime-stanford-corenlp-models/sutime-stanford-corenlp-models/3.6.0/sutime-stanford-corenlp-models-3.6.0.jar

Rodhlann avatar Dec 03 '24 04:12 Rodhlann

ooo, good news! ticket #908 gets rid of the corenlp .jar file, so we just need to get that merged in.

As for the non-interrupt: my biggest concern is making sure that anyone who is actively developing right now knows the impact. We can handle that with broad messaging and direct communication, so I think that should be fine?

TanyaStere42 avatar Dec 04 '24 18:12 TanyaStere42

How fortunate! That sounds good then, I will try and see if I can figure out what the actual impact might be on my own Wildbook fork to try and clarify any communication around that before pushing it out to everyone else.

Rodhlann avatar Dec 04 '24 23:12 Rodhlann

It looks like this will reduce the .git dir from 584M --> 232M, a 60% size reduction!

Rodhlann avatar Dec 06 '24 00:12 Rodhlann

fyi, simplified_water_polygons.sql.gz is a "flukebook only" thing... and i am not even sure it is used over there any more. i am fine with taking this outta git and making it something we load in separately. (if it is even used, i mean.)

naknomum avatar Dec 10 '24 18:12 naknomum

Update on this Issue: after meeting with @TanyaStere42 @naknomum and @goddesswarship we have decided that the major shift to a "new" git baseline should wait until a future milestone, where it can be justified as part of a shift to the next major version of Wildbook (how exciting!)

In the mean time we are going to investigate git clone --depth and git pull --depth themed behavior to see if it might help contributors pull smaller slices of the git repository down without causing issues for the larger project, or issues with local development and pull requests.

Preliminary testing with this process is promising, as the cloned project ends up being 76M, rather than the 500-800M sized downloads we are currently seeing cloning fresh.

I will be running an experiment on another issue to help identify any frustrating or confusing behavior in the shallow git workflow, so we can evaluate and document the experience for future contributors!

Additional context around this process and related decisions can be found starting here in the WildMe discord: https://discord.com/channels/1136422697094619266/1233563838339743815/1314393194557079575

Rodhlann avatar Dec 13 '24 00:12 Rodhlann

Removing milestone for now ... let's revisit this before too long.

vkirkl avatar Jan 21 '25 18:01 vkirkl