gdal icon indicating copy to clipboard operation
gdal copied to clipboard

gdal vector simplify-coverage: implement progress callback

Open rouault opened this issue 6 months ago • 6 comments

CC @dbaston

rouault avatar May 29 '25 14:05 rouault

I didn't add a progress callback because > 90% of the time was inside GEOS.

I've thought about making the GDAL <-> GEOS conversions more efficient (WKB is a slow way to do it) but for any real dataset the conversion just doesn't amount to anything.

dbaston avatar May 29 '25 14:05 dbaston

I didn't add a progress callback because > 90% of the time was inside GEOS.

thanks for the feedback! I've just readjusted the progress ratio based on that. Yes the progress will be a bit of little use then, but at least the user will see some non-zero percentage at the beginning... Or if they process a dataset with many layers

rouault avatar May 29 '25 15:05 rouault

@dbaston Do you believe the GEOS team would welcome enhancements to add progress report and cancellation for this algorithm? (to be checked if that's possible in that case). I don't see any precedent in the C API (the interrupt callback is the closest, but doesn't cover the progress part)

rouault avatar May 29 '25 15:05 rouault

but at least the user will see some non-zero percentage at the beginning...

OTOH getting stuck at 5% before jumping to 100% can be confusing.

Do you believe the GEOS team would welcome enhancements to add progress report and cancellation for this algorithm?

Yes, an enhancement to make the algorithm interruptible would definitely be welcome (and just involves calling GEOS_CHECK_FOR_INTERRUPTS() at appropriate points and frequencies.

Accurately estimating progress is pretty tricky -- the parts of an algorithm that cause the bottleneck can be data-dependent (and parameter-dependent). Maybe @dr-jts has given it some thought.

dbaston avatar May 29 '25 15:05 dbaston

Accurately estimating progress is pretty tricky -- the parts of an algorithm that cause the bottleneck can be data-dependent (and parameter-dependent). Maybe @dr-jts has given it some thought.

There are several phases to the CoverageSimplifier algorithm, each with a main loop iterating over a list of objects (e.g. input polygons and edges). The primary one is probably the one in TPVWSimplifier.simplify. A crude progress metric is the item count through this loop. More accurate would be to count the number of vertices processed out of the total. And most accurate would be to add counters to all the main loops (with some estimate of fraction of overall time taken for each).

Surprisingly, this is the first time there's been an ask about progress reporting in GEOS (although it has been discussed in JTS). It seems like it could be added fairly transparently by using a global or thread-local callback.

dr-jts avatar May 29 '25 21:05 dr-jts

Some timings from simplifying TIGER state boundaries with a tolerance of 1e-3...

97% of runtime in GDALVectorSimplifyCoverageOutputDataset::Process(OGRLayer&, OGRLayer&) 93% of runtime in geos::coverage::CoverageSimplifier::simplify(double)

which is split into 40% in geos::coverage::CoverageRingEdges::build 52% in geos::coverage::TPVWSimplifier::simplify

So, a roughly 50/50 split between figuring out the shared edges and progressively removing vertices. And < 5% GEOS conversion overhead.

dbaston avatar May 30 '25 15:05 dbaston

The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 28 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

  • that all unit tests are passing

  • that all comments by reviewers have been addressed

  • that there is enough information for reviewers, in particular link to any issues which this pull request fixes

  • that you have written unit tests where possible In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request. If there is no further activity on this pull request, it will be closed in 2 weeks.

github-actions[bot] avatar Jun 28 '25 02:06 github-actions[bot]

While we hate to see this happen, this PR has been automatically closed because it has not had any activity in the last 6 weeks. If this pull request should be reconsidered, please follow the guidelines in the previous comment and reopen this pull request. Or, if you have any further questions, just ask! We love to help, and if there's anything the GDAL project can do to help push this PR forward please let us know how we can assist.

github-actions[bot] avatar Jul 12 '25 03:07 github-actions[bot]