polygon-clipping
polygon-clipping copied to clipboard
Reducing loops
Hey @mfogel
Been looking at the performance of this lib and one of the things I noticed is that is spends a lot of time in the startup before it even gets to the algorithm.
I was wondering if we could reduce the number of loops in the startup., eg currently it loops through all the points twice, once for pointsAsObjects
and then again for new geomIn.MultiPoly
. I'm thinking we could combine the loops here relatively simply, but at the same time we'd get a fairly good performance gain.
What do you think?
Hi @rowanwins - yes, there's a number of O(n) or faster loops over the input that happen before the inputs get to the meat of the algorithm. The two you mentioned, and there's also cleanInput.forceMultiPoly
and cleanInput.cleanMultiPoly
. Some or all of those could be combined.
One thing I'd like maintain (or better yet improve) around this section is helpful error messages on invalid input.
Awesome, I'll take a look with an eye on maintaining the input validation.
Ps You'll have to be patient with my tests, I'm not very familiar with jest
so i might ask a few silly questions along the way :)
Just a quick report back here, my initial attempts at getting any performance boosts out of this refactoring have been minimal which is disappointing.
I wonder if there is anything about the various compare functions that would be exceedingly slow?
I might try and hook up flamebearer to see if that shows anything...
Flamebearer analysis done, I've attached a zipped html file with the results.
And here is an image
If I'm reading the thing correctly the things that stand out to me are
-
compareBefore
-
getAvailableLinkedEvents
-
_isValidEdgeForPoly
Anyway further investigation on another night
Hi @mfogel
Great work on the latest release, it looks like things are coming along very nicely in terms of stability & accuracy so I'm wonder if it's worth revisiting some of the performance work?
A couple of other things I've spotted in addition to some of the bits above.
-
Error('Input has more than two coordinates. Only 2-dimensional polygons supported.'
A length test is being run on every single coord array to see if it's a 2D coord. A few options to perhaps streamline this
- Skip the check altogether and just have it as part of the readme that only 2D polys are supported. Additional values in that array make no material difference as to whether things are going to fail or not.
- If you wanted to keep the check could we reduce its impact by checking the first coord only to see if it's 3d? Normally data is stored in a fairly consistent manner so if the first coord has 2 dimensions it can be presumed the rest does?
- Removing redundant coords on both
cleanRing
on the input steps andgetGeom
in the output steps
- Some checking is being done on both input and output to remove redundant coords on a ring. I don't think either of these checks are strictly necessary in ensuring the algorithm works. What's your feeling on retaining these checks, is there any compelling reason to keep them? Could they be retained via a flag/option?
Cheers Rowan
hi @rowanwins, thanks for the thoughts. i think those are good points, we can lighten up those checks.
On the first I'm ok removing the check altogether and just adding something to the readme stating that data beyond the 2nd dimension will just be ignored.
For the 2nd point (removing redundant coords):
- as far as removing from the input, if the end-to-end tests still pass without this extra step, then we can probably just remove it altogether.
- as far as from the output, this might be a place to start introducing options to the library. i'm not really sure what the best way might be to allow for options to be passed into the library... I like the simple API that's currently presented with the variable number of arguments, but my understanding is that precludes the use of optional arguments to those functions. So I was thinking maybe something like:
let pc = new polygonClipping.PolygonClipping({ skip_output_redundancy_checks: true})
pc.union(poly1, poly2, poly3 ...)
or maybe
polygonClipping.unionWithOptions({skip_output_redundancy_checks: true}, poly1, poly2, poly3 ...)
thoughts?
hi @mfogel , maybe add an extra tool / API call for removing redundant coords, making it optional for the user. or make it as an config, but maybe off by default if it's not strictly necessary and reduces performance. could be interesting to see a benchmark with or without this operation.
Hmmm yeah the class option could work if there were additional things that could be configured however for a single setting it doesn't quite seem worth it...
As suggested by @erf perhaps it could be a seperate operation run by the user prior eg polygonClipping.cleanCoords()
?
Something like polygonClipping.unionWithOptions({skip_output_redundancy_checks: true}, poly1, poly2, poly3 ...)
feels a bit messy IMHO
Another alternative could the second set of geometries be supplied as an array? So something like
polygonClipping.union(poly1, [poly2, poly3], {
skip_output_redundancy_checks: true
})
Or you could force the second set of geometries to be a multipoly, this is the approach taken by the martinez
library, this would force a bit more data prep on the part of the user.
If I have any other brainwaves I'll share