containers icon indicating copy to clipboard operation
containers copied to clipboard

IntSet: reverse bitmap for faster comparison?

Open jwaldmann opened this issue 6 years ago • 4 comments

This is just an idea to improve instance Ord IntSet (related to #470 ). It's quite a pervasive change, and it'd help only in a special case - that may occur frequently, though.

When all elements of the IntSet are small, the tree is in fact Tip prefix bitmap. For just that special case, instance Eq IntSet is just two comparisons of machine words, but instance Ord IntSet (in suggested #670) needs more ops (more than 10, see relateTipTip, relateBM).

It would be much easier if compare (Tip p bm1) (Tip p bm2) = compare bm1 bm2 but since the comparison must have fromAscList semantics, we need = compare (reverse bm1) (reverse bm2) (the implementation does not actually use revNat)

Instead of doing the reversal here, we could define bitmapOf x not as 2^x but 2^(wordSize - 1 - x)

In the general case (comparing Tips that sit below Bin) we need the Relation result (that encodes 5 possible results) so there's no hope of doing this in one op.

I think that the underlying reason for all this is that some ops on machine words are uniform (direction does not matter, as in .&.), some are symmetric (two directions, but identical cost, e.g., shift-left, shift-right), but some are asymmetric (one direction, the other one is missing: lexicographic comparison, carry propagation in arithmetical operations).

Now everything regarding bitmaps (not prefixes!) in IntSet is uniform or symmetric - except for this comparison?

jwaldmann avatar Jul 28 '19 13:07 jwaldmann

I remember those options being discussed in the original paper; you may want to take a look and see to what extent the times have changed. Another thought, for IntSet but not IntMap: what if we store the mask index (instead of the mask) in the low bits of the prefix? That lets us drop a word from each Bin node. Some extra operations may be required some places, so I don't know how performance will compare, ultimately.

treeowl avatar Jul 28 '19 17:07 treeowl

"options being discussed in the original paper" - you mean Okasaki and Gill 1998 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452 ?

This is for Maps, not Sets, so they don't have bitmaps in the leaves.

They discuss little endian vs. big endian patricia trees - and make this remark regarding asymmetric operations: "there does not appear to be a clever bit-twiddling solution to calculate the highest one bit in a number, as there was for the lowest one bit".

This point is moot now since containers uses clz/ctz primops?

I am not seeing any reference to bitwise tricks in Morrison 1968 http://www.mathcs.emory.edu/~cheung/papers/XML/PatriciaTrie-JACM1968.pdf (if you meant that with "original paper")

Before we go any further with this idea (of reversing bitmaps for faster instance Ord IntSet) I will look at what happens with type IntSet = Word in my NFA->DFA benchmark (where I know that all the sets have small elements).

jwaldmann avatar Jul 31 '19 08:07 jwaldmann

type IntSet = Word

see https://github.com/jwaldmann/containers/blob/intset%3Dword/containers-tests/benchmarks/IntSet.hs#L53

this gives results like

benchmarked instanceOrd:dense/16/@IntSet
time                 336.4 ms   (333.0 ms .. 341.3 ms)

benchmarked instanceOrd:dense/16/@Word
time                 274.1 ms   (272.8 ms .. 274.9 ms)

I conclude that a more efficient implementation of instance Ord IntSet can at best save another 20 percent runtime w.r.t. the current proposal (for the case that the sets are really Tips).

I am not really certain about these data: I sprinkled the code with Inline and Specialize pragmas. Some of them do change the runtimes. I have no precise idea why.

As I currently do not have an actual use case (I deal with automata sometimes, but I don't need NFA->DFA right now), I will not push this any further.

jwaldmann avatar Aug 02 '19 18:08 jwaldmann

Do you have any thoughts on my idea of compressing the prefix and mask into one word for IntSet.

On Fri, Aug 2, 2019, 2:42 PM jwaldmann [email protected] wrote:

type IntSet = Word

see https://github.com/jwaldmann/containers/blob/intset%3Dword/containers-tests/benchmarks/IntSet.hs#L53

this gives results like

benchmarked instanceOrd:dense/16/@IntSet time 336.4 ms (333.0 ms .. 341.3 ms)

benchmarked instanceOrd:dense/16/@Word time 274.1 ms (272.8 ms .. 274.9 ms)

I conclude that a more efficient implementation of instance Ord IntSet can at best save another 20 percent runtime w.r.t. the current proposal (for the case that the sets are really Tips).

I am not really certain about these data: I sprinkled the code with Inline and Specialize pragmas. Some of them do change the runtimes. I have no precise idea why.

As I currently do not have an actual use case (I deal with automata sometimes, but I don't need NFA->DFA right now), I will not push this any further.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/674?email_source=notifications&email_token=AAOOF7IDNRCAG66QHH2MSBLQCR5XXA5CNFSM4IHMN5ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ORJSI#issuecomment-517805257, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7LGFBPPWGMZ23SI3A3QCR5XXANCNFSM4IHMN5ZA .

treeowl avatar Aug 02 '19 18:08 treeowl