polygon-clipping icon indicating copy to clipboard operation
polygon-clipping copied to clipboard

Reducing loops

Open rowanwins opened this issue 5 years ago • 8 comments

Hey @mfogel

Been looking at the performance of this lib and one of the things I noticed is that is spends a lot of time in the startup before it even gets to the algorithm.

I was wondering if we could reduce the number of loops in the startup., eg currently it loops through all the points twice, once for pointsAsObjects and then again for new geomIn.MultiPoly. I'm thinking we could combine the loops here relatively simply, but at the same time we'd get a fairly good performance gain.

What do you think?

rowanwins avatar Jul 20 '18 11:07 rowanwins

Hi @rowanwins - yes, there's a number of O(n) or faster loops over the input that happen before the inputs get to the meat of the algorithm. The two you mentioned, and there's also cleanInput.forceMultiPoly and cleanInput.cleanMultiPoly. Some or all of those could be combined.

One thing I'd like maintain (or better yet improve) around this section is helpful error messages on invalid input.

mfogel avatar Jul 20 '18 17:07 mfogel

Awesome, I'll take a look with an eye on maintaining the input validation.

Ps You'll have to be patient with my tests, I'm not very familiar with jestso i might ask a few silly questions along the way :)

rowanwins avatar Jul 20 '18 23:07 rowanwins

Just a quick report back here, my initial attempts at getting any performance boosts out of this refactoring have been minimal which is disappointing.

I wonder if there is anything about the various compare functions that would be exceedingly slow?

I might try and hook up flamebearer to see if that shows anything...

rowanwins avatar Jul 30 '18 12:07 rowanwins

Flamebearer analysis done, I've attached a zipped html file with the results.

And here is an image flame

If I'm reading the thing correctly the things that stand out to me are

  • compareBefore
  • getAvailableLinkedEvents
  • _isValidEdgeForPoly

Anyway further investigation on another night

rowanwins avatar Jul 30 '18 13:07 rowanwins

Hi @mfogel

Great work on the latest release, it looks like things are coming along very nicely in terms of stability & accuracy so I'm wonder if it's worth revisiting some of the performance work?

A couple of other things I've spotted in addition to some of the bits above.

  1. Error('Input has more than two coordinates. Only 2-dimensional polygons supported.' A length test is being run on every single coord array to see if it's a 2D coord. A few options to perhaps streamline this
  • Skip the check altogether and just have it as part of the readme that only 2D polys are supported. Additional values in that array make no material difference as to whether things are going to fail or not.
  • If you wanted to keep the check could we reduce its impact by checking the first coord only to see if it's 3d? Normally data is stored in a fairly consistent manner so if the first coord has 2 dimensions it can be presumed the rest does?
  1. Removing redundant coords on both cleanRing on the input steps and getGeom in the output steps
  • Some checking is being done on both input and output to remove redundant coords on a ring. I don't think either of these checks are strictly necessary in ensuring the algorithm works. What's your feeling on retaining these checks, is there any compelling reason to keep them? Could they be retained via a flag/option?

Cheers Rowan

rowanwins avatar Apr 01 '19 00:04 rowanwins

hi @rowanwins, thanks for the thoughts. i think those are good points, we can lighten up those checks.

On the first I'm ok removing the check altogether and just adding something to the readme stating that data beyond the 2nd dimension will just be ignored.

For the 2nd point (removing redundant coords):

  • as far as removing from the input, if the end-to-end tests still pass without this extra step, then we can probably just remove it altogether.
  • as far as from the output, this might be a place to start introducing options to the library. i'm not really sure what the best way might be to allow for options to be passed into the library... I like the simple API that's currently presented with the variable number of arguments, but my understanding is that precludes the use of optional arguments to those functions. So I was thinking maybe something like:
let pc = new polygonClipping.PolygonClipping({ skip_output_redundancy_checks: true})
pc.union(poly1, poly2, poly3 ...)

or maybe

polygonClipping.unionWithOptions({skip_output_redundancy_checks: true}, poly1, poly2, poly3 ...)

thoughts?

mfogel avatar Apr 02 '19 18:04 mfogel

hi @mfogel , maybe add an extra tool / API call for removing redundant coords, making it optional for the user. or make it as an config, but maybe off by default if it's not strictly necessary and reduces performance. could be interesting to see a benchmark with or without this operation.

erf avatar Apr 02 '19 21:04 erf

Hmmm yeah the class option could work if there were additional things that could be configured however for a single setting it doesn't quite seem worth it...

As suggested by @erf perhaps it could be a seperate operation run by the user prior eg polygonClipping.cleanCoords()?

Something like polygonClipping.unionWithOptions({skip_output_redundancy_checks: true}, poly1, poly2, poly3 ...) feels a bit messy IMHO

Another alternative could the second set of geometries be supplied as an array? So something like

polygonClipping.union(poly1, [poly2, poly3], {
   skip_output_redundancy_checks: true
})

Or you could force the second set of geometries to be a multipoly, this is the approach taken by the martinez library, this would force a bit more data prep on the part of the user.

If I have any other brainwaves I'll share

rowanwins avatar Apr 03 '19 05:04 rowanwins