jsonnet icon indicating copy to clipboard operation
jsonnet copied to clipboard

Allow arbitrary arrays for most set operations

Open sbarzowski opened this issue 4 years ago • 3 comments

Set operations like setUnion or setDiff technically are only valid on sets (sorted array). This can lead to nasty, hard-to-notice bugs when someone forgets about it.

We should consider relaxing the requirements on the input. Note that we will still need to maintain the requirement for setMember (unless we're willing to accept linear search there, which kinda defeats the point).

sbarzowski avatar Feb 23 '21 23:02 sbarzowski

I'd prefer to leave this as a documentation problem, lest we penalize the intended performance benefits in these functions. It's cheaper to test whether a sequence is already sorted than to sort it, but we'd still be introducing that O(n) cost for every call, even for cases where callers took care to ensure the array was sorted first.

Would you consider sibling functions with related names, such as seqSetUnion, unsortedSetUnion, or setUnionFlex? I don't like any of these names, but I hope you get the idea.

seh avatar Feb 24 '21 00:02 seh

Yes, that was the initial motivation for having them this way. I saw bugs resulting from this decision more than once.

Union, intersection and difference already require O(n) time. The only operation where it really makes a difference currently is setMember.

Would you consider sibling functions with related names, such as seqSetUnion, unsortedSetUnion, or setUnionFlex? I don't like any of these names, but I hope you get the idea.

That wouldn't solve the problem. If someone remembers that it needs to be sorted they can just use std.set. The issue is when someone doesn't remember / didn't notice this requirement.

I have one more idea to solve this, but it's a bit weird. We have implicit under-the-hood special type for sets. It will still work like a regular array from user's perspective, but we'll automagically use more efficient implementations. This is quite complicated, though and the set is only valid with respect to a specific keyF, so it's more than "known-to-be-sorted" bit.

sbarzowski avatar Feb 24 '21 13:02 sbarzowski

Union, intersection and difference already require O(n) time.

Given that, we'd be increasing the coefficient (adding another O(n)), but not the exponent. That's not as bad.

seh avatar Feb 24 '21 13:02 seh