mmc
mmc copied to clipboard
Simple optimization?
typedef struct {
const BYTE* levelUp;
const BYTE* nextTry;
} selectNextHop_t;
(apart from obvious using offets instead of pointers) you could cache the next (MINMATCH+1) byte of the sequence
typedef struct {
const int32 levelUp: 28;
const int32 nextTry: 28;
int8 nextByte: 8;
} selectNextHop_t;
while promoting to the next level the nextByte should be updated to keep the next-next symbol (symbol at MINMATCH+LEVEL offset). While searching you will start from comparing with the nextByte from the structure (and that value will be different in most cases). And I believe in theory that can signifinactly reduce the amount of memory jumps into the buffer and improve performance 😃
This is a sound idea. It seems this would reduce the amount of memory accesses, so it could improve speed.
The bit twiddling hacks to store the necessary information into a packed structure tends to be a bit concerning, as there are ways this could end badly, both for portability and for performance. But I guess this is secondary topic that could be addressed in a second step.
Dear Yann,
Could you please explain one thing.
How the link between the node A and the node C is implemented?
Reading the code didn't help me to find the answer.. 😃
From what I understand the
nextTry & MASK is the index of the next (previous in time) node at the lowest level (it would give you the link from A to B, not to C).
How you were able to avoid adding a distinct explicit field like indexOfTheNextNodeOnTheSameLevel which value can be different from nextTry & MASK for the nodes on the levels above the lowest one? (upd: on the lowest level it can also be different to skip the nodes which were pushed to the upper levels)
It's been a while since I looked at this code.
If I understand correctly, this picture comes from the MMC homepage, and explains how a new chain, of guaranteed minimum match length N+1, is created.
Considering the previous steps, I don't think that the link from A to C existed beforehand. Instead, it is discovered, by scrubbing the previous chain (mml=n) Once a position of length N+1 is discovered, it's extracted from the mml=n chain, and becomes a new chained member to the chain mml=n+1. This is the operation that the red arrows symbolizes.
Yes, the picture is from that page 😄 I mean I thought the discovered link A->C is probably somehow stored for future use and so next time you will search for the same N+1 string you will be able to jump from A to C not visiting the B and other intermediate nodes on the lower level.
Once it's discovered, it's indeed saved, as a "good" link. But it first needs to be discovered. This is done by scrubbing the list of "unclassified" candidates.
My question is how this saving is done. I believe some field of the node A should keep that good link.
But the nodes only have const BYTE* levelUp and const BYTE* nextTry.
The value A.nextTry % ChainTableSize is an index of B, not C.
Maybe the nodes themselves are somehow moved inside the chain table? I mean the node C is copied to the position of the node B in the table and before that the node B is moved to some other place and so on.
The new chain starts with levelUp, which essentially states that, from now on, all candidates in the chain are guaranteed to be matches of at least N+1 bytes. Then, all remaining positions within the underlying N chain are scrubbed, and any match of length N+1 is brought into the N+1 chain, using the nextTry link. Doing so, such position leaves its previous chain, which was only guaranteeing N bytes (making it shorter). But note that, on reaching the N+1 chain, they are now considered "unsorted". Meaning the next time we reach this part of the chain, they will be tested for N+2 promotion. The promotion process ends on reaching the previous Gateway to N+1 (if it existed) or on reaching the end of the N chain.