dulwich icon indicating copy to clipboard operation
dulwich copied to clipboard

repack implementation

Open jelmer opened this issue 9 years ago • 3 comments

Dulwich should have a way of repacking a repository.

jelmer avatar May 23 '15 19:05 jelmer

+1

mjhennig avatar Jul 09 '16 13:07 mjhennig

I've tried the initial version of the repack function. In https://github.com/jelmer/dulwich/issues/549 you wrote:

  • it currently caches all objects in memory while it repacks
  • it's not triggered automatically, you have to call it manually

My initial trial also shows what appear to be performance issues with what I thought might be a trivial scenario where an initial clone is followed by an immediate repack. It seems that a lot of time is spent iterating through the objects in the old packs then updating the pack:

2017-08-24 16:13:00.051Z:INFO:15732:__main__:321:git - Counting objects: 116428
2017-08-24 16:13:00.078Z:INFO:15732:__main__:321:git - Counting objects: 124009, done.
...
2017-08-24 16:13:00.259Z:INFO:15732:__main__:321:git - Compressing objects: 100% (24115/24115)
2017-08-24 16:13:00.263Z:INFO:15732:__main__:321:git - Compressing objects: 100% (24115/24115), done.
2017-08-24 16:13:03.021Z:INFO:15732:__main__:321:git - Total 124009 (delta 99887), reused 123908 (delta 99808)
2017-08-24 16:13:15.190Z:INFO:15732:dulwich.object_store:354:@@@ repack 1
2017-08-24 16:13:15.190Z:INFO:15732:dulwich.object_store:358:@@@ repack 2
2017-08-24 16:13:15.190Z:INFO:15732:dulwich.object_store:361:@@@ repack 3
2017-08-24 16:13:35.501Z:INFO:15732:dulwich.object_store:365:@@@ repack 4
2017-08-24 16:14:28.001Z:INFO:15732:dulwich.object_store:368:@@@ repack 5
2017-08-24 16:14:28.001Z:INFO:15732:dulwich.object_store:371:@@@ repack 6
2017-08-24 16:14:28.042Z:INFO:15732:dulwich.object_store:374:@@@ repack 7
2017-08-24 16:14:35.979Z:INFO:15732:__main__:321:git - Counting objects: 1
2017-08-24 16:14:36.514Z:INFO:15732:__main__:321:git - Counting objects: 103356, done.
...
2017-08-24 16:14:36.621Z:INFO:15732:__main__:321:git - Compressing objects: 100%  (57708/57708), done.
2017-08-24 16:14:37.044Z:INFO:15732:__main__:321:git - Total 103356 (delta 45615), reused 103235 (delta 45552)
2017-08-24 16:14:57.030Z:INFO:15732:dulwich.object_store:354:@@@ repack 1
2017-08-24 16:14:57.030Z:INFO:15732:dulwich.object_store:358:@@@ repack 2
2017-08-24 16:14:57.030Z:INFO:15732:dulwich.object_store:361:@@@ repack 3
2017-08-24 16:15:09.760Z:INFO:15732:dulwich.object_store:365:@@@ repack 4
2017-08-24 16:15:25.288Z:INFO:15732:dulwich.object_store:368:@@@ repack 5
2017-08-24 16:15:25.289Z:INFO:15732:dulwich.object_store:371:@@@ repack 6
2017-08-24 16:15:25.300Z:INFO:15732:dulwich.object_store:374:@@@ repack 7

I added instrumentation as shown:

+        _logger.info('@@@ repack 1')
         loose_objects = set()
         for sha in self._iter_loose_objects():
             loose_objects.add(self._get_loose_object(sha))
+        _logger.info('@@@ repack 2')
         objects = {(obj, None) for obj in loose_objects}
         old_packs = list(self.packs)
+        _logger.info('@@@ repack 3')
         for pack in old_packs:
             objects.update((obj, None) for obj in pack.iterobjects())
 
+        _logger.info('@@@ repack 4')
         self.add_objects(objects)
 
+        _logger.info('@@@ repack 5')
         for obj in loose_objects:
             self._remove_loose_object(obj.id)
+        _logger.info('@@@ repack 6')
         for pack in old_packs:
             self._remove_pack(pack)
+        _logger.info('@@@ repack 7')
         return len(objects)

A second issue is that after calling repack(), I seem to have lost HEAD, and am unable to find what should be a valid SHA:

$ git log
fatal: bad object HEAD
$ git show 62bc27
fatal: bad object 62bc27
$ ls -lR objects/
objects/:
total 8
drwxrwxr-x 2 earl earl 4096 Aug 24 09:14 info/
drwxrwxr-x 2 earl earl 4096 Aug 24 09:15 pack/

objects/info:
total 0

objects/pack:
total 0

Is this unexpected or am I mistaken in thinking that repack() is a drop-in replacement for pack_loose_objects() ?

@@ -153,9 +156,9 @@ class _Repo(_Gitorious):
             # Try using repack to control the number of open file descriptors
             # that dulwich requires:
             #
-            # https://github.com/jelmer/dulwich/issues/281
+            # https://github.com/jelmer/dulwich/issues/549
 
-            self.__repo.object_store.pack_loose_objects()
+            self.__repo.object_store.repack()

earlchew avatar Aug 24 '17 16:08 earlchew

(follow up for that last comment in #552)

jelmer avatar Aug 26 '17 11:08 jelmer