trim git history in non-interruptive way
Feature description and context
Git history has one massive commit that causes timeout for some people with slower internet. Getting rid of that would improve everyone's dev experience and make the work more accessible in remote areas
Investigation notes https://stackoverflow.com/questions/4515580/how-do-i-remove-the-old-history-from-a-git-repository
Feature sign-off requirements
- Git history no longer has massive commit that causes significant download lag
- Current developers do not need to do anything to address the change
I can pick this up!
awesome, very excited :partying_face:
The files listed below are the largest files in our git history, most of them no longer exist on main and can probably be removed from the git history, but sutime-stanford-corenlp-models-3.6.0.jar is quite large, and I'm not sure if it can be removed.
If we were to remove these files everyone who works on this repo would likely have to re-clone the repo to restore their local git history. That may go against the "Current developers do not need to do anything to address the change" acceptance criteria.
| Hash | Size | File Path | Exists |
|---|---|---|---|
| 9f4704c9c549 | 21MiB | tessdata/eng.traineddata | ❌ |
| 64ae74e1e0a0 | 21MiB | src/main/resources/tessdata/eng.traineddata | ❌ |
| bc24e2720e37 | 21MiB | frontend/public/forest.png | ❌ |
| afb8ef60dc56 | 26MiB | src/main/webapp/video/reportingencounters/ReportingAnEncounter.swf | ❌ |
| 8feb73fa5a1d | 27MiB | postgis/simplified_water_polygons.sql.gz | ✅ |
| 914341124af1 | 30MiB | src/main/webapp/video/usingpaintnet/Using Paint.NET.swf | ❌ |
| 08d4ec4e2e9a | 44MiB | src/main/webapp/video/scanningformatches/ScanningForAMatch.swf | ❌ |
| 7aa76c4ed2a8 | 44MiB | src/main/webapp/video/RolexBradNorman.wmv | ❌ |
| 1574db1090f6 | 68MiB | src/main/webapp/video/approvingencounters/ApprovingAnEncounter.swf | ❌ |
| 52852e1955f3 | 72MiB | local-repo/sutime-stanford-corenlp-models/sutime-stanford-corenlp-models/3.6.0/sutime-stanford-corenlp-models-3.6.0.jar | ✅ |
ooo, good news! ticket #908 gets rid of the corenlp .jar file, so we just need to get that merged in.
As for the non-interrupt: my biggest concern is making sure that anyone who is actively developing right now knows the impact. We can handle that with broad messaging and direct communication, so I think that should be fine?
How fortunate! That sounds good then, I will try and see if I can figure out what the actual impact might be on my own Wildbook fork to try and clarify any communication around that before pushing it out to everyone else.
It looks like this will reduce the .git dir from 584M --> 232M, a 60% size reduction!
fyi, simplified_water_polygons.sql.gz is a "flukebook only" thing... and i am not even sure it is used over there any more. i am fine with taking this outta git and making it something we load in separately. (if it is even used, i mean.)
Update on this Issue: after meeting with @TanyaStere42 @naknomum and @goddesswarship we have decided that the major shift to a "new" git baseline should wait until a future milestone, where it can be justified as part of a shift to the next major version of Wildbook (how exciting!)
In the mean time we are going to investigate git clone --depth and git pull --depth themed behavior to see if it might help contributors pull smaller slices of the git repository down without causing issues for the larger project, or issues with local development and pull requests.
Preliminary testing with this process is promising, as the cloned project ends up being 76M, rather than the 500-800M sized downloads we are currently seeing cloning fresh.
I will be running an experiment on another issue to help identify any frustrating or confusing behavior in the shallow git workflow, so we can evaluate and document the experience for future contributors!
Additional context around this process and related decisions can be found starting here in the WildMe discord: https://discord.com/channels/1136422697094619266/1233563838339743815/1314393194557079575
Removing milestone for now ... let's revisit this before too long.