Multiple step sizes lost when calling intersection()
I would expect the code below to print FrameSet("1-100x25,1-100x1"), as both FrameSets are identical, but it prints FrameSet("1-100"). Technically, both FrameSets (fs and fs2) contain the same frames, but FrameSet.order is different for each.
from fileseq import FrameSet
fs = FrameSet("1-10x3,1-10x1")
fs2 = fs.intersection(fs)
# This will be "1-10", not "1-10x3,1-10x1".
print fs2
# Note the difference in the order between the two.
print fs.order
print fs2.order
This seems to happen because FrameSet.items is a frozenset that is intersected with another frozenset, then converted back to a frame range. This could be the case for other functions, but intersection() was my use case. I am currently working around the issue by splitting one of the FrameSet's by ',', intersecting one by one, and joining them back together. Any chance for a fix?
Unfortunately the set-like features that were added to FrameSet are acting more like python sets than a FrameSet. Obviously a FrameSet maintains order, while a set does not. And the simple implementation of the set methods are just delegating to the underlying set.
An implementation like this might be closer to correct:
def intersection(self, *other):
_other = set()
for o in other:
_other.update(o)
ordered = [i for i in self.order if i in _other]
return self.from_iterable(ordered, sort=False)
fs.intersection(fs)
# FrameSet("1-10x3,2-3,5-6,8-9")
This preserves the original order, although it does not preserve the original string notation. That is because FrameSet really never tries to figure out these original strings where arbitrary overlap occurs. It just saves the original string for printing. If you were to call fs.normalize() then you get "1-10".
Maybe I am overthinking it, but it would seem a little complicated to try and preserve "1-10x3,1-10x1" after the intersection, since it would have to analyze the string again and see if the original "1-10x1" was still valid for the final intersection.
Oh, I misspoke. I was concerned with keeping the order, not the string notation - I didn't think about just having the remaining frames listed out individually. That would be just fine for me. Thanks for the quick response!
The intersection of two FrameSet objects is -- intentionally -- the FrameSet that contains the intersection of the set of frames provided on either side. The order of the two FrameSets on either side is -- again intentionally -- abandoned. This is to maintain operator commutation consistent with set theory. To use the old saw; this is a feature, not a bug, of all the set-like methods on the FrameSet, which were always intended to ignore (and therefore lose in their return values) order.
The reason for this is fairly simple:
fs1 = FrameSet("1-10x3,1-10x1")
Contains exactly the same underlying frames as:
fs2 = FrameSet("1,2,3,4,5,6,7,8,9,10")
Which means, for set theory purposes, they're both subsets of themselves and of each other.
That implies that their unions should be the same, their intersections the same, and their symmetric difference the same. Also when all of the preceding is true their difference should be the same, regardless of which side of the operator they are on.
Obviously the intersection of a FrameSet with itself should always equal the intersection of that same FrameSet with itself:
(fs1 & fs1) == (fs1 & fs1) (fs2 & fs2) == (fs2 & fs2)
Which implies that these should also be True:
(fs1 & fs2) == (fs1 & fs2) (fs2 & fs1) == (fs2 & fs1)
Now, in set theory, these should also always be True:
(fs1 & fs2) == (fs2 & fs1) (fs2 & fs1) == (fs1 & fs2)
But if we attempt to maintain the order of the left operand over the order of the right in the return value, then it's not True, except in cases where the they happened to have both the same set of underling frames AND the same order.
Striving to maintain order would break the commutative properties of the operators and be inconsistent with set theory. It also breaks more subtle set questions (are two different orderings of the same smaller set of frames NOT both subsets of a larger set if only one of the two had order consistent with the superset?) while also almost necessarily being less performant (trying to algorithmically build a minimal representation consistent with the originally provided string's implied -- but infinitely arbitrary -- order while also representing the intersection accurately is HARD and often ambiguous).
There are two somewhat competing ideas at work here: there is the notion that a FrameSet should be effectively a DSL parser allowing entirely arbitrary orderings of a set of frames via string representation (call this String First), and there's the idea that a FrameSet should be a performant interface for manipulating collections of frames (call this Collection First)... the addition of the set theory functionality was intended to satisfy and enhance the use of FrameSet objects in a Collection First mode, and it provides for that well. It also does work very hard (within any FrameSet created by string and not via a set theory operation) to maintain the ordering and string of the String First mode... where there's a compromise to be made the compromise was made in favor of the mode that inspired the features being added.
I don't know if that helps clarify, and perhaps some more explanation should be incorporated into the docs, but I don't believe the set operations are behaving in a manner inconsistent with the operations they exist to enable, while modifying them to attempt to maintain strong representation ordering would both reduce performance and break the assumptions of code that's been relying on consistent set operations.
M
On Jan 4, 2018, 05:14 +0000, gbrou [email protected], wrote:
Oh, I misspoke. I was concerned with keeping the order, not the string notation - I didn't think about just having the remaining frames listed out individually. That would be just fine for me. Thanks for the quick response! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
You bring up a good point about Collection First vs String First. I don't mean to change the behavior or methodology behind the tool, just wanted to know if this behavior was expected and intentional. It sounds like it is and "fixing" that behavior would cause more harm than good. Thank you for taking to time to explain why it is the way it is and I will find a work around.
Edit: I hadn't accounted for the case where two sets being intersected have different step sizes or sizes in different orders.
@yawpitch, thanks for that information.
While I can appreciate the intention to conform the set-like features of FrameSet to set theory, I have to point out that via the #19 merge that introduced this logic, there was never really a discussion about the intention being one way or another. FrameSet, even though it has the word 'set' in it, had previously never been a proper set data structure, and none of the documentation was updated to make guarantees about preserving or abandoning aspects of the original instances in the results. It was just left ambiguous. There should definitely have been more documentation on the changes and expected behaviors, and that was pretty much a ball drop on the merge process of #19 (1.0.0). I've tried to assume a primary role in maintaining the library since then to make sure we discuss changes more in-depth, and also to hopefully provide faster turnarounds for fixes.
modifying them to attempt to maintain strong representation ordering would ... break the assumptions of code that's been relying on consistent set operations.
Technically that has already happened a few times with the merge of #19 (1.0.0), which changed previous semantics silently. I don't have that list handy or anything. I just recall having issues raised where the library no longer behaved like previous versions (albeit pre 1.x releases) and had to fix them one by one as reported. I don't have any metrics on how many users at my studio are actually consuming the set-like methods currently. I haven't ever used them yet. But yes, since we are at 1.x now and the features are already there, we would have to make semantic changes optional through flags at this point.
I'm hoping a v2 can explore the internal data structure approach that my go/cpp ports are using (github.com/justinfx/gofileseq) which are hopefully equally or more performant, and also far less memory intensive.
Could you perhaps clarify exactly what it is you're trying to do, and why a workaround is required?
From the email chain so far I'm not entirely clear on why you want to take an intersection of a FrameSet with itself to begin with. I would think that the original FrameSet already has the string representation, contents, and order that you seem to be expecting back from the intersection of itself with itself, but I must be missing something.
Thanks,
M
On Jan 4, 2018, 20:53 +0000, Justin Israel [email protected], wrote:
@yawpitch, thanks for that information. While I can appreciate the intention to conform the set-like features of FrameSet to set theory, I have to point out that via the #19 merge that introduced this logic, there was never really a discussion about the intention being one way or another. FrameSet, even though it has the word 'set' in it, had previously never been a proper set data structure, and none of the documentation was updated to make guarantees about preserving or abandoning aspects of the original instances in the results. It was just left ambiguous. There should definitely have been more documentation on the changes and expected behaviors, and that was pretty much a ball drop on the merge process of #19 (1.0.0). I've tried to assume a primary role in maintaining the library since then to make sure we discuss changes more in-depth, and also to hopefully provide faster turnarounds for fixes.
modifying them to attempt to maintain strong representation ordering would ... break the assumptions of code that's been relying on consistent set operations. Technically that has already happened a few times with the merge of #19 (1.0.0), which changed previous semantics silently. I don't have that list handy or anything. I just recall having issues raised where the library no longer behaved like previous versions (albeit pre 1.x releases) and had to fix them one by one as reported. I don't have any metrics on how many users at my studio are actually consuming the set-like methods currently. I haven't ever used them yet. But yes, since we are at 1.x now and the features are already there, we would have to make semantic changes optional through flags at this point. I'm hoping a v2 can explore the internal data structure approach that my go/cpp ports are using (github.com/justinfx/gofileseq) which are hopefully equally or more performant, and also far less memory intensive. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I'm not entirely clear on why you want to take an intersection of a FrameSet with itself to begin with
I interpreted that as being just a simple example of how the intersection of two FrameSet objects lose the order, which could be important to some workflow. The example could have shown two different but overlapping instances, but I assumed the point was that especially if the two instances are the same, that the resulting order would be the same.
Exactly @justinfx . I am using fileseq to submit renders and I can have a scene frame range and a render pass frame range. I have to check for the intersection of the two if the pass range is given, but the user could potentially have provided identical sets. I tried to simplify the example but it ended up adding confusion.
Thanks for the clarification, I couldn't be certain that assumption was correct.
I'm on my phone and traveling, and sadly not on a machine that has access to fileseq currently, but from memory I believe the easiest way to do what you want would be:
fs3 = FrameSet.from_iterable((f for f in fs1 if f in fs2))
That should give you the frames that exist in both, in the order presented by the leftmost FrameSet.
On Jan 4, 2018, 21:27 +0000, gbrou [email protected], wrote:
Exactly @justinfx . I am using fileseq to submit renders and I can have a scene frame range and a render pass frame range. I have to check for the intersection of the two if the pass range is given, but the user could potentially have provided identical sets. I tried to simplify the example but it ended up adding confusion. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@yawpitch yea that is effectively what I proposed in my alternate implementation of intersection(). But it does require from_iterable(..., sort=False), otherwise you are back in the same boat again :-)
Can confirm that those solutions will work for me. I can implement them in my code without any need to change fileseq. Thanks y'all!
The default for the sort kwarg in from_iterable is False, IIRC. You should have to explicitly pass sort=True to get the sorting behavior.
On Jan 4, 2018, 22:01 +0000, Justin Israel [email protected], wrote:
@yawpitch yea that is effectively what I proposed in my alternate implementation of intersection(). But it does require from_iterable(..., sort=False), otherwise you are back in the same boat again :-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Yep, you are right about that. I think I had that in my implementation to ensure the behavior. Ignore me!
Nothing wrong with explicitly passing it ...
One thought, there's no reason not to add a "left_ordering" passed by **kwargs that is not provided and True would default to False on the various explicitly named set methods ... there's obviously no way to pass that for the infix operators, but for intersection, union, etc. you could offer the best of both worlds, and provide correct set operations or let them explicitly choose to break the convention of commutativity. Though symmetric_difference with multiple others might get a bit weird.
M
On Jan 4, 2018, 22:24 +0000, Justin Israel [email protected], wrote:
Yep, you are right about that. I think I had that in my implementation to ensure the behavior. Ignore me! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.