centraldogma icon indicating copy to clipboard operation
centraldogma copied to clipboard

Provide a way to remove old commits

Open minwoox opened this issue 2 years ago • 6 comments

Motivation: Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history, Central Dogma will eventually get in trouble managing disk usage. We can handle this by maintaining the primary and secondary Git repositories internally. This works in this way:

  • Commits are pushed to the primary Git repository.
  • If the number of commits exceeds the threshold (minRetentionCommits), then the secondary Git repository is created.
  • Commits are pushed to both primary and secondary Git repositories.
  • If the secondary Git repository has the number of commits more than the threshold;
    • The secondary Git repository is promoted to the primary Git repository.
    • The primary Git repository is removed completely.
    • Another secondary Git repository is created.
  • Back to the third.

Modifications:

  • Add CreateRollingRepositoryCommand that creates the rolling repository by the CommitRetentionManagementPlugin.
  • Add GitRepositoryV2 that manages the rolling jGit repositories.
    • The name of internal repositories will be: foo_0000000000, foo_0000000001, foo_0000000002 and so on
      • So that we can only store the suffix of the primary repo to maintain the repository. (i.e. We don't have to maintain the directory information of the secondary repo.)
      • RepositoryMetadataDatabase has the suffix in its file database.
    • GitRepository is not removed for the migration test.
  • Add InternalRepository that has jGit repository and CommitIdDatabase.
  • What happens if the revision of an operation(such as diff, watch, history, etc.) is lower than the firstRevision of the current primary repo?
    • If Revision.INIT(1) is used, the firstRevision is used instead.
      • e.g. diff(INIT, new Revision(100) ...) is equals to diff(new Revision(firstRevisionNumber), new Revision(100) ...)
    • If the Revision between Revision.INIT(1) and the firstRevision is used, a RevisionNotFoundException is raised.
      • except watch and findLatestRevision.

Result:

  • Close #575
  • A repository now removes the commits that are older than minRetentionCommits while keeping the commits made in the recent minRetentionDay.

Todo:

  • Provide a way to set minRetentionCommits and minRetentionDay for each repository.
  • Unused internal repositories are removed by purging service.
  • Support mirroring from CD to external Git.

minwoox avatar Feb 23 '22 13:02 minwoox

Codecov Report

Patch coverage: 70.19% and project coverage change: -2.17% :warning:

Comparison is base (079daeb) 65.69% compared to head (84cb6cd) 63.53%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #681      +/-   ##
============================================
- Coverage     65.69%   63.53%   -2.17%     
- Complexity     3351     3473     +122     
============================================
  Files           358      366       +8     
  Lines         13893    15093    +1200     
  Branches       1496     1687     +191     
============================================
+ Hits           9127     9589     +462     
- Misses         3916     4591     +675     
- Partials        850      913      +63     
Files Changed Coverage Δ
...ecorp/centraldogma/server/CentralDogmaBuilder.java 53.38% <0.00%> (-0.82%) :arrow_down:
...orp/centraldogma/server/CommitRetentionConfig.java 0.00% <0.00%> (ø)
...p/centraldogma/server/command/AbstractCommand.java 47.61% <ø> (ø)
.../linecorp/centraldogma/server/command/Command.java 74.35% <0.00%> (-13.52%) :arrow_down:
...server/command/CreateRollingRepositoryCommand.java 0.00% <0.00%> (ø)
...ogma/server/command/StandaloneCommandExecutor.java 82.25% <0.00%> (-3.46%) :arrow_down:
...internal/storage/DirectoryBasedStorageManager.java 63.31% <ø> (ø)
...ver/internal/storage/repository/CacheableCall.java 65.21% <ø> (ø)
...internal/storage/repository/RepositoryWrapper.java 22.85% <0.00%> (-2.15%) :arrow_down:
...al/storage/repository/cache/CachingRepository.java 92.36% <0.00%> (-2.95%) :arrow_down:
... and 16 more

... and 5 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 23 '22 15:02 codecov[bot]

If Revision.INIT(1) is used, the firstRevision is used instead.

This somewhat sounds like a surprising behavior to me. This means a diff command that once worked as expected starts to return completely different result once rolling occurs.

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

trustin avatar Mar 08 '23 10:03 trustin

This issue seems to be a necessary feature to keep the cluster running reliably.

aidanbae avatar Jul 28 '23 08:07 aidanbae

@aidanbae Sorry for making you wait for this. I'm working on it now. Please stay tuned.

minwoox avatar Aug 24 '23 08:08 minwoox

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

That makes sense. Let me apply that approach.

minwoox avatar Aug 25 '23 03:08 minwoox

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

Had a chat with the team, and we decided to raise an exception instead of treating it as an empty state.

This PR is ready, so PTAL. 😉

minwoox avatar Aug 29 '23 12:08 minwoox