containers
containers copied to clipboard
IntSet: reverse bitmap for faster comparison?
This is just an idea to improve instance Ord IntSet (related to #470 ). It's quite a pervasive change, and it'd help only in a special case - that may occur frequently, though.
When all elements of the IntSet are small, the tree is in fact Tip prefix bitmap. For just that special case, instance Eq IntSet is just two comparisons of machine words,
but instance Ord IntSet (in suggested #670) needs more ops (more than 10, see relateTipTip, relateBM).
It would be much easier if compare (Tip p bm1) (Tip p bm2) = compare bm1 bm2
but since the comparison must have fromAscList semantics, we need
= compare (reverse bm1) (reverse bm2) (the implementation does not actually use revNat)
Instead of doing the reversal here, we could define bitmapOf x not as 2^x but 2^(wordSize - 1 - x)
In the general case (comparing Tips that sit below Bin) we need the Relation result (that encodes 5 possible results) so there's no hope of doing this in one op.
I think that the underlying reason for all this is that some ops on machine words are uniform (direction does not matter, as in .&.), some are symmetric (two directions, but identical cost, e.g., shift-left, shift-right), but some are asymmetric (one direction, the other one is missing: lexicographic comparison, carry propagation in arithmetical operations).
Now everything regarding bitmaps (not prefixes!) in IntSet is uniform or symmetric - except for this comparison?
I remember those options being discussed in the original paper; you may want to take a look and see to what extent the times have changed. Another thought, for IntSet but not IntMap: what if we store the mask index (instead of the mask) in the low bits of the prefix? That lets us drop a word from each Bin node. Some extra operations may be required some places, so I don't know how performance will compare, ultimately.
"options being discussed in the original paper" - you mean Okasaki and Gill 1998 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452 ?
This is for Maps, not Sets, so they don't have bitmaps in the leaves.
They discuss little endian vs. big endian patricia trees - and make this remark regarding asymmetric operations: "there does not appear to be a clever bit-twiddling solution to calculate the highest one bit in a number, as there was for the lowest one bit".
This point is moot now since containers uses clz/ctz primops?
I am not seeing any reference to bitwise tricks in Morrison 1968 http://www.mathcs.emory.edu/~cheung/papers/XML/PatriciaTrie-JACM1968.pdf (if you meant that with "original paper")
Before we go any further with this idea (of reversing bitmaps for faster instance Ord IntSet) I will look at what happens with type IntSet = Word in my NFA->DFA benchmark (where I know that all the sets have small elements).
type IntSet = Word
see https://github.com/jwaldmann/containers/blob/intset%3Dword/containers-tests/benchmarks/IntSet.hs#L53
this gives results like
benchmarked instanceOrd:dense/16/@IntSet
time 336.4 ms (333.0 ms .. 341.3 ms)
benchmarked instanceOrd:dense/16/@Word
time 274.1 ms (272.8 ms .. 274.9 ms)
I conclude that a more efficient implementation of instance Ord IntSet
can at best save another 20 percent runtime w.r.t. the current proposal
(for the case that the sets are really Tips).
I am not really certain about these data: I sprinkled the code with Inline and Specialize pragmas. Some of them do change the runtimes. I have no precise idea why.
As I currently do not have an actual use case (I deal with automata sometimes, but I don't need NFA->DFA right now), I will not push this any further.
Do you have any thoughts on my idea of compressing the prefix and mask into one word for IntSet.
On Fri, Aug 2, 2019, 2:42 PM jwaldmann [email protected] wrote:
type IntSet = Word
see https://github.com/jwaldmann/containers/blob/intset%3Dword/containers-tests/benchmarks/IntSet.hs#L53
this gives results like
benchmarked instanceOrd:dense/16/@IntSet time 336.4 ms (333.0 ms .. 341.3 ms)
benchmarked instanceOrd:dense/16/@Word time 274.1 ms (272.8 ms .. 274.9 ms)
I conclude that a more efficient implementation of instance Ord IntSet can at best save another 20 percent runtime w.r.t. the current proposal (for the case that the sets are really Tips).
I am not really certain about these data: I sprinkled the code with Inline and Specialize pragmas. Some of them do change the runtimes. I have no precise idea why.
As I currently do not have an actual use case (I deal with automata sometimes, but I don't need NFA->DFA right now), I will not push this any further.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/674?email_source=notifications&email_token=AAOOF7IDNRCAG66QHH2MSBLQCR5XXA5CNFSM4IHMN5ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ORJSI#issuecomment-517805257, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7LGFBPPWGMZ23SI3A3QCR5XXANCNFSM4IHMN5ZA .