jocko
jocko copied to clipboard
commitlog recovery time
The recovery time of a commitlog is currently linear with the number of entries in the log. As a result, a log with many messages (e.g. millions) can take several seconds or more to recover. The reason is the index is rebuilt by reading back the log:
https://github.com/travisjeffery/jocko/blob/1c69c5c274caf8d6b3a7953c67e5de51c55378be/commitlog/segment.go#L60-L65
It's unclear to me what the purpose of truncating the index and rebuilding it is. AFAICT, it just initializes the segment's nextOffset and position. However, the index sanity check is already reading the last entry which we can recover nextOffset from. Is there another purpose for rebuilding the index?
So it appears one issue is this does not actually seem to do what it says:
https://github.com/travisjeffery/jocko/blob/1c69c5c274caf8d6b3a7953c67e5de51c55378be/commitlog/index.go#L200-L208
The issue is that the memory-mapped index file is 10MB, so reading at idx.position-entryWidth actually reads an empty entry (since relEntry adds the base offset, the offset comparison passes). It seems this precludes the optimization I was thinking about. I wonder if we could just do a binary search to find the real last entry though...
Yeah we shouldn't have to rebuild every time. Initially, I didn't and then someone hit a bug where the index wasn't setup properly and submitted a PR doing this to hack a fix and I left it to optimize later.
I can't quite remember what Kafka does here but binary searching for the last entry sounds good to me.
I have it working with a binary search. Will submit a PR shortly.
Awesome - you da man.
@travisjeffery One snag I've run into: I'm using a fork of the commitlog package which I've modified index entries to also contain a Size field which is the size of the log message that I'm using for another project. This allowed me to find the first empty entry doing the following:
i := sort.Search(n, func(i int) bool {
if err := idx.ReadEntryAtFileOffset(entry, int64(i*entryWidth)); err != nil {
panic(err)
}
return entry.Position == 0 && entry.Size == 0
})
Of course, Jocko doesn't have the Size field. We could do something like the following:
i := sort.Search(n, func(i int) bool {
if err := idx.ReadEntryAtFileOffset(entry, int64(i*entryWidth)); err != nil {
panic(err)
}
return entry.Position == 0 && i > 0
})
However, the problem is handling the special case where the index has a single entry. We have no way of distinguishing between an empty index and an index with one entry in it since in both cases the first entry will have a Position of 0. In my case, the Size field is always non-zero for valid entries and 0 for "empty" entries, but we don't have that luxury here.
Any ideas?
I think in NewIndex we can check if the index file exists and if it does assume the index is legit and that the length of the file marks the position of the next entry. And we can tell the number of entries based on this, cause if there is no entry the file will be 0 bytes, with 1 entry it'll be 8 bytes.
We gotta change the code in here: https://github.com/travisjeffery/jocko/blob/master/commitlog/index.go#L86-L89. Cause right now the index file itself is initialized to the full size, when we should leave index file as is and only initialize the mmap's length by passing the length into the MapRegion or MapAt apis.
Not sure if the gommap lib which is pretty jank will support that though, which is what Kafka is able to do I think. Alternatively when the index is created we could write -1 as the first offset and overwrite that on the first entry.