`<flat_map>`: Can we separately compare key and mapped containers?
Currently, we're implementing flat_(multi)map's comparison operators in inconsistent ways:
https://github.com/microsoft/STL/blob/fbe5394250ee15e451fb686585de1f2fa1603062/stl/inc/flat_map#L769-L774 https://github.com/microsoft/STL/blob/fbe5394250ee15e451fb686585de1f2fa1603062/stl/inc/flat_map#L776-L780
For operator==, key and mapped containers are compared separately. For operator<=>, they are compared step by step together.
It seems to me that the standard wording only specifies the latter ([container.reqmts]/44, [container.opt.reqmts]/4). The current strategy for operator== possibly incorrectly behaves if the comparison beyond the index of the first inequivalent iterator pair doesn't have well-defined result (i.e. causes UB, terminates, or throws an exception).
E.g. if _Left_base._Data.values[0] != _Right_base._Data.values[0], perhaps we're generally disallowed to compare _Left_base._Data.keys[1] with _Right_base._Data.keys[1].
However, the separately comparing strategy might still be viable if the element types are "well-behaving" enough (e.g. if they are arithmetic types, or possibly string?).
The current strategy for operator== possibly incorrectly behaves if the comparison beyond the index of the first inequivalent iterator pair doesn't have well-defined result (i.e. causes UB, terminates, or throws an exception).
operator== is defined in terms of std::equal(), and the order of comparison isn't specified for std::equal() as far as I can tell. So I think a program is not allowed to assume that equality comparisons for some iterator pairs aren't performed just because there is an inequivalent iterator pair earlier in the sequence.
However, I think the wording in [container.reqmts]/44 still suggests that keys and values should be compared alternatingly. So are standard-compliant programs allowed to assume this alternation or observe divergence from it?
It seems to me that the standard wording only specifies the latter ([container.reqmts]/44, [container.opt.reqmts]/4).
However, I think the wording in [container.reqmts]/44 still suggests that keys and values should be compared alternatingly. So are standard-compliant programs allowed to assume this alternation or observe divergence from it?
The fact that both of the cited paragraphs are "Returns" elements is significant. They specify only the value that a function is expected to return, not a means by which that value is to be calculated nor any of the observable effects of doing so. See [structure.specifications]/3.8 which says that it's "a description of the value(s) returned by the function."
We must return the same value as depicted in the Standard, but we may calculate that value in the way we deem to be best for our implementation. (I don't feel bad for anyone that puts incomparable values in a standard container and then tries to compare that container.)
Oh, I'm sorry for forgetting LWG-3410. Is it intended that lexicographical_compare_three_way can also compare more elements than necessary?
If so, I think we can try the strategy that compares key containers first.
We must return the same value as depicted in the Standard, but we may calculate that value in the way we deem to be best for our implementation.
Yeah, but this alone is not sufficient for the transformation: If operator== of the keys and and operator== of the values are allowed to interact such that they produce different results on alternation than on non-alternation, then this transformation does not produce the same output.
That said, designing such questionable equality comparators is just asking for trouble, so it's probably not worth thinking much about.
Is it intended that lexicographical_compare_three_way can also compare more elements than necessary? If so, I think we can try the strategy that compares key containers first.
Yes, it may do more comparisons. But I doubt comparing keys first is worth it in terms of complexity of the implementation and computational runtime.
First, the code transformation is not straightforward. To produce the same result, you would have to compute the index where keys first compare non-equal, then determine the index where values first compare non-equal and finally return the ordering at the smaller of these two indices. So you can't just delegate to lexicographical_compare_three_way because you have to know these indices.
Second, while you can get away with not fully determining the index for the values if it is the larger one, you have to determine the index for keys accurately if you decide to compare the keys first. And that means you might do lots of unnecessary comparisons between keys.
We talked about this at the weekly maintainer meeting. We don't believe that the Standard should require us to tolerate non-comparable elements here. (We think the Standard's vagueness supports us here.) As for performance, we expect most equality comparisons to find equality, in which case all keys and all values must be inspected, so separately comparing the keys and then the values is better for locality (for all types) and better for vectorization. For spaceship, @StephanTLavavej thinks this will be uncommon for flat_meow but @CaseyCarter thinks that containers-in-containers are common-ish. However, we would want to see benchmark results comparing the current approach to "compare keys before comparing values" before deciding whether to change the strategy. Comparing keys first could similarly be better for locality and vectorization, but at the cost of more code to determine the spaceship result (since it isn't as simple as equal && equal).
While I can't provide a benchmark, here is a back-of-the-envelope calculation how many more (key) comparisons the algorithm separating key and value comparison is expected to do compared to the alternating algorithm (which is optimal in the number of performed comparisons):
Let's assume that comparisons are independent and that the probability that a pair of keys or a pair of values compares different is $p$. The number of value pair comparisons is the same for the alternating and separating algorithms, but they differ in the number of performed key comparisons. Let $X_a$ and $X_s$ respectively be the position of the last pair of keys compared by the alternating or separating algorithm.
For simplicity of calculation, let's also assume that the algorithms are applied to sequences of infinite size. Then $X_a$ and $X_s$ are geometrically distributed. The probability that the current key pair is the last compared one is $1-(1-p)^2$ for the alternating and $p$ for the separating algorithm. Thus, the expected positions of the last compared key pair are $E[X_a] = 1/(1-(1-p)^2)$ and $E[X_s] = 1/p$. That means that the separating algorithm is expected to do $E[X_s]/E[X_a] = 2-p$ times as many comparisons of key pairs as the alternating algorithm, so almost twice as many for small $p$. In total, this means the separating algorithm is expected to do up to 50% more comparisons of keys and values.
(Note that this calculation breaks down for containers of $n$ elements if the probability becomes very small, on the order of $p \in O(1/n)$, since the calculation error due to assuming infinite size is no longer negligible. But the estimate remains good even for asymptotically slightly larger $p$.)
In total, this means the separating algorithm is expected to do up to 50% more comparisons of keys and values.
I don't quite get into this math.
But the thing here is that vectorization would matter here more than like 50%.
And for vectorization, the compiler is not really capable of vectorizing this automatically (except gcc with -O3 that is doing insane stunt here, but we don't target gcc). So we're looking into library optimization.
For separating equal we have memcmp for quite long time, which works great. For spaceship we have mismatch vectorization that does lex compare.
For alternating comparison we would have to implement manual, and that would be not trivial, considering we need to support bit width difference of keys and values. Though gcc with -O3 would do its magic even for the alternating case, we might wait for our target compiler for quite long to implement that, and honestly I'm not sure if I event want that, considering how tricky this optimization is from the compiler side.
Vectorization might move the needle the other way in some cases. (But I have to point out that these 50 % are just for the expected case assuming this very simple probability distribution; in the worst case, separate comparison will result in size()/2-times more comparisons than the alternating comparison algorithm.)
But the math does suggest that the separate comparison should not be the default algorithm for arbitrary key and value types, because it will likely result in worse performance on average when individual key comparisons are not cheap.
Edit: To expand a bit on the point about the worst-case running time:
Let $t_S(x)$ denote the running time of the separated comparison algorithm and $t_A(x)$ that of the alternating comparison algorithm for some input $x$. Then there is some constant $\alpha > 0$ such that $\frac{t_A(x)}{t_S(x)} \leq \alpha$ for all inputs $x$, i.e., the running time of the alternating algorithm is worse than the separated algorithm by at most a constant factor. But conversely, there is no constant $\beta>0$ such that we have $\frac{t_S(x)}{t_A(x)} \leq \beta$ for all inputs $x$, so the running time of the separated algorithm isn't just worse than that of the alternating algorithm by some constant factor for some inputs.