split index support
What is a split index?
When using a split index the index is split up into two separate files:
- the split index at
$GIT_DIR/index - the shared index at
$GIT_DIR/sharedindex.<SHA-1>
The shared index contains all entries while the split index contains and accumulates changes. These changes in the split index are occasionally written into the shared index, either "automatically" based on config settings, or by the git-update-index command.
"Automatically" here means that this runs every time the index is being read or updated.
Link Extension
The link extension stores 2 bitmaps that record how the stored changes in the split index should be merged with the shared index.
The replace bitmap stores which entries in the shared index should be replaced by entries stored in the split index, i.e.
[0, 1, 1, 0, 1] -> replace shared index entries at index 1, 2, 4 with split index entries at 0, 1, 2
Any additional entries in the split index should be added to the shared index.
The delete bitmap stores which entries in the shared index should be deleted, i.e.
[1, 0, 0, 1] -> delete entries at index 0 and 3
Tasks
- [x] check git source code to learn more about how split index works and update this issue
- [x] #655
- [ ] writing
- [ ] write only the split index without ever merging / updating the shared index
- [ ] if this turns out to be too much work just write a regular index and discard the split / shared index files
- [ ] merge split and shared indexes (based on maxPercentChange) - source code reference
- [ ] write only the split index without ever merging / updating the shared index
- [ ] update active shared index modification time every time the split index is being read / updated to prevent automatic deletion
Config Settings
core.splitIndexenables the use of a split indexsplitIndex.maxPercentChangeThe percentage threshold of entries in the split index (compared to the shared index) that triggers a write to the shared index. It defaults to 20.splitIndex.sharedIndexExpireIf the modification time of a shared index is older than this value it will be deleted. Takes values like "now", "never", "2.weeks.ago" (default). This needs to be parsed.
Notes
- split index and sparse index are incompatible in git
- git | date.c is used to parse dates
Questions
- Does the split index get emptied / recreated from scratch when writing its changes to the shared index?
- I assume the actively used shared index can and should never be deleted, but based only on the
sharedIndexExpiresetting it could be
References
Thanks for summing up this feature, this issue shall be the authority of this feature.
Besides the need to be able to read the shared index and process it so the index can be used, I have a strong feeling that this capability is legacy already. With sparse indices and the index DIR entry, I think there isn't going to be a problem with index sizes anymore. As far as I know, and this might be very wrong, split index support isn't really available for plenty of features or interactions, particularly with sparse indices. I guess they will add it when these sparse indices get so large that a split index is reasonable, and I think gitoxide wouldn't be the first to implement this one.
My stance here is to only add minimal support and be OK with 'unsplitting' the index when writing by dropping the extension (and writing the whole index), and then wait and see, if writing gives any trouble.
I am curious as to what you discover as you dig into the git source and the history of the split index feature :).