mmc icon indicating copy to clipboard operation
mmc copied to clipboard

Simple optimization?

Open artelk opened this issue 2 years ago • 7 comments

typedef struct {
    const BYTE* levelUp;
    const BYTE* nextTry;
} selectNextHop_t;

(apart from obvious using offets instead of pointers) you could cache the next (MINMATCH+1) byte of the sequence

typedef struct {
    const int32 levelUp: 28;
    const int32 nextTry: 28;
    int8 nextByte: 8;
} selectNextHop_t;

while promoting to the next level the nextByte should be updated to keep the next-next symbol (symbol at MINMATCH+LEVEL offset). While searching you will start from comparing with the nextByte from the structure (and that value will be different in most cases). And I believe in theory that can signifinactly reduce the amount of memory jumps into the buffer and improve performance 😃

artelk avatar Sep 05 '23 14:09 artelk

This is a sound idea. It seems this would reduce the amount of memory accesses, so it could improve speed.

The bit twiddling hacks to store the necessary information into a packed structure tends to be a bit concerning, as there are ways this could end badly, both for portability and for performance. But I guess this is secondary topic that could be addressed in a second step.

Cyan4973 avatar Sep 05 '23 16:09 Cyan4973

Dear Yann, Could you please explain one thing. Capture How the link between the node A and the node C is implemented? Reading the code didn't help me to find the answer.. 😃 From what I understand the nextTry & MASK is the index of the next (previous in time) node at the lowest level (it would give you the link from A to B, not to C). How you were able to avoid adding a distinct explicit field like indexOfTheNextNodeOnTheSameLevel which value can be different from nextTry & MASK for the nodes on the levels above the lowest one? (upd: on the lowest level it can also be different to skip the nodes which were pushed to the upper levels)

artelk avatar Oct 06 '23 11:10 artelk

It's been a while since I looked at this code.

If I understand correctly, this picture comes from the MMC homepage, and explains how a new chain, of guaranteed minimum match length N+1, is created.

Considering the previous steps, I don't think that the link from A to C existed beforehand. Instead, it is discovered, by scrubbing the previous chain (mml=n) Once a position of length N+1 is discovered, it's extracted from the mml=n chain, and becomes a new chained member to the chain mml=n+1. This is the operation that the red arrows symbolizes.

Cyan4973 avatar Oct 06 '23 16:10 Cyan4973

Yes, the picture is from that page 😄 I mean I thought the discovered link A->C is probably somehow stored for future use and so next time you will search for the same N+1 string you will be able to jump from A to C not visiting the B and other intermediate nodes on the lower level.

artelk avatar Oct 06 '23 18:10 artelk

Once it's discovered, it's indeed saved, as a "good" link. But it first needs to be discovered. This is done by scrubbing the list of "unclassified" candidates.

Cyan4973 avatar Oct 06 '23 18:10 Cyan4973

My question is how this saving is done. I believe some field of the node A should keep that good link. But the nodes only have const BYTE* levelUp and const BYTE* nextTry.

The value A.nextTry % ChainTableSize is an index of B, not C. Maybe the nodes themselves are somehow moved inside the chain table? I mean the node C is copied to the position of the node B in the table and before that the node B is moved to some other place and so on.

artelk avatar Oct 06 '23 18:10 artelk

The new chain starts with levelUp, which essentially states that, from now on, all candidates in the chain are guaranteed to be matches of at least N+1 bytes. Then, all remaining positions within the underlying N chain are scrubbed, and any match of length N+1 is brought into the N+1 chain, using the nextTry link. Doing so, such position leaves its previous chain, which was only guaranteeing N bytes (making it shorter). But note that, on reaching the N+1 chain, they are now considered "unsorted". Meaning the next time we reach this part of the chain, they will be tested for N+2 promotion. The promotion process ends on reaching the previous Gateway to N+1 (if it existed) or on reaching the end of the N chain.

Cyan4973 avatar Oct 06 '23 19:10 Cyan4973