containers icon indicating copy to clipboard operation
containers copied to clipboard

Change in complexity of Set/Map operations

Open meooow25 opened this issue 3 years ago • 8 comments

Why does #830 change the documented complexity of Set/Map operations from $O\bigl(m \log \bigl(\frac{n}{m} + 1 \bigr)\bigr)$ to $O\bigl(m \log\bigl(\frac{n+1}{m+1}\bigr)\bigr)$?

meooow25 avatar Nov 13 '22 10:11 meooow25

Edge cases. When $m=0$, it's not nice to blow up. If you have a better way to suggest, please go ahead.

treeowl avatar Nov 13 '22 10:11 treeowl

I don't think that's a concern in big-O notation. $O(\log n)$ is a common complexity and $\log 0$ is undefined.

meooow25 avatar Nov 13 '22 10:11 meooow25

The tricky thing is that there are two variables. Readers will want to know what happens as $n$ increases without bound when $m=0$.

treeowl avatar Nov 13 '22 11:11 treeowl

Readers will want to know what happens as $n$ increases without bound when $m = 0$

That's not something the big-O complexity can be used for. It is only a bound on how fast the function grows with respect to its inputs as the inputs grow arbitrarily large.

A more reasonable thing a reader might want, perhaps, is to check the complexity for two equal sized sets. With $n = m$, the old complexity gives the value of $O(n)$, and the new one incorrectly gives $O(1)$.

In any case, I think it would be best for documentation to stick to the complexity published in the paper it cites as the source for it. I hope you'll consider changing it back.

meooow25 avatar Nov 14 '22 04:11 meooow25

For $n = m$, the current formula, which is $O(m \log(\tfrac{n+1}{m+1}))$, actually yields $O(0)$, not $O(1)$, which is even worse in a way: $O(1)$ means "constant-time", whereas $O(0)$ means "no time at all".

Admittedly the original formula, namely $O(m \log(\tfrac nm + 1))$, blew up at $m = 0$, which is as sensible an input as $m = n$, but this formula has the advantage of being what the original paper has.

At the very least, the side condition that $m \leq n$ in the current docs should be changed to $m < n$ if the $\tfrac{n+1}{m+1}$ formula is kept — surely the formula is wrong for $m = n$. I would vote for " $O\bigl(m \log\bigl(\tfrac nm + 1\bigr)\bigr)$, $0 < m \leq n$ " because that at least is valid in the domain that it claims to be valid on. :)

@meooow25:

Readers will want to know what happens as increases without bound when $m = 0$

That's not something the big-O complexity can be used for. It is only a bound on how fast the function grows with respect to its inputs as the inputs grow arbitrarily large.

Admittedly the paper does not define its big-O notation, but yes that is how big-O notation works on multiple variables. Wikipedia describes that at least one possible interpretation of big-O notation on multiple variables is that the formula should hold if at least one of the variables grows large. I hope that the paper uses this definition, and not the alternative that you seem to assume where all variables need to grow large, because in that case the formula says nothing about the case where $m$ is small, which is very useful and important special case.

tomsmeding avatar Jun 22 '23 16:06 tomsmeding

Even $O(1)$ is quite wrong for that edge case, so we need to fix this.

treeowl avatar Jun 22 '23 18:06 treeowl

@treeowl Would you accept a PR that changes the complexities of this form to my suggested version, namely $O\bigl(m \log\bigl(\tfrac nm + 1\bigr)\bigr)$, $0 < m \leq n$ ? (Or any other version that does properly account for the edge cases)

tomsmeding avatar Jul 07 '23 12:07 tomsmeding

I won't be able to evaluate this until the weekend. I'm currently traveling.

treeowl avatar Jul 07 '23 13:07 treeowl