borg icon indicating copy to clipboard operation
borg copied to clipboard

ideas for using the flags

Open ThomasWaldmann opened this issue 1 year ago • 3 comments

The ChunkIndex is a mapping 256bit key --> (32bit flags, 32bit size) since #8513.

The user flags are:

  • F_USED, meaning the chunk is referenced / used. this is used by borg compact to determine which chunks are used / which are not used (and then deleting the unused chunks from the repo).
  • F_COMPRESS, meaning the chunk shall get (re-)compressed, used by borg repo-compress.

The system flags are:

  • F_NEW, meaning the chunk was added after the last saving of the chunk index (flagging the chunks that need to be saved next)

Other flag bits are not used yet and available for creative usage.

Besides using these bits as flags/markers of some kind, we can also do memory-efficient (no additional memory use!) set operations, like e.g. to compute a set intersection:

  • set bit1 for members of set1
  • set bit2 for members of set2
  • intersection set1 & set2: iterate over all entries in the ChunkIndex, entries with both bits set are members of the intersection.

Collect any ideas what can be done with that below.

ThomasWaldmann avatar Nov 01 '24 23:11 ThomasWaldmann

In #8503 there was the idea of F_CLEAN (when not set, the hashtable entry is "dirty", meaning it has not yet been written to storage).

Update: Instead the of the clean/dirty metaphor, "new" (F_NEW) was found to be more appropriate.

Update: fixed by #8541

ThomasWaldmann avatar Nov 05 '24 17:11 ThomasWaldmann

borg repo-compress could flag the chunks it finds in need of recompression. currently it uses a python datastructure for them.

Update: fixed by #8543

ThomasWaldmann avatar Nov 14 '24 19:11 ThomasWaldmann

Maybe borg check could use the flags to avoid having to read repo chunks multiple times in different parts of the check.

ThomasWaldmann avatar Nov 15 '24 09:11 ThomasWaldmann