geos
geos copied to clipboard
Add functions to interrupt a single thread
This PR changes GEOS_interruptRegisterCallback to operate on a per-thread basis; the callback will only be registered on the thread from which GEOS_interruptRegisterCallback is called. This would break existing expectations of a global callback, so maybe a better solution is to add a new function that registers a thread-local callback.
I'm fuzzy on how this gets used outside of PostGIS which (I think?) would not be affected by this change.
cc @nyalldawson
I'm fuzzy on how this gets used outside of PostGIS which (I think?) would not be affected by this change.
In Shapely we don't use this callback. @caspervdw experimented with this callback earlier this year (https://github.com/shapely/shapely/pull/1370), but we ended up implementing a way to interrupt on the shapely side (so in between calls to GEOS).
@jorisvandenbossche
(so in between calls to GEOS).
The big benefit of this call is that it can potentially interrupt stupidly long geos internal methods -- ie. when a user tries to intersection a >million vertex polygon with another >million vertex polygon :rofl: . Right now there's no clean way to interrupt those operations without completely killing the whole process.
@dbaston
I'm fuzzy on how this gets used outside of PostGIS which (I think?) would not be affected by this change.
QGIS currently doesn't use it, because we can't isolate to individual threads. I'd love to use it to allow users to cancel long running geos operations though (such as the previously mentioned ridiculous intersection case)
The api for the interruption handling is a little tricky to understand. I gather I register the callback, and then from the callback I call GEOS_interruptRequest() when I want to interrupt a process. Is this correct?
(If so, this may be a nice opportunity to clean up that api and make the callback return bool for when the interruption is desired)
The api for the interruption handling is a little tricky to understand. I gather I register the callback, and then from the callback I call GEOS_interruptRequest() when I want to interrupt a process. Is this correct?
Yes, this is the case. (It took me a while to figure it out too!) The API sort of makes sense for a system where GEOS_interruptRequest interrupts the thread that calls it. Right now it seems that GEOS_interruptRequest will interrupt only whichever single thread happens to check for an interrupt next.
In your case, would you want to register a different callback for each thread? (Does the callback need to be thread-local?) Or would you use a single callback like I did in InterruptTest.cpp to decide if a given thread should be interrupted?
In your case, would you want to register a different callback for each thread? (Does the callback need to be thread-local?) Or would you use a single callback like I did in InterruptTest.cpp to decide if a given thread should be interrupted?
I could work with either. The per-thread/context callback would be cleaner though.
(so in between calls to GEOS).
The big benefit of this call is that it can potentially interrupt stupidly long geos internal methods
Yes, and so ideally we should certainly do both (and thus also implement the GEOS interrupt callback). The initial motivation of the PR I referenced was to be able to interrupt a call that would create too many geometries (and blow up memory and eventually crash), for example by accidentally creating billions of points (so many calls to a cheap GEOS function). For that we needed to use another mechanism, since (I think) the interrupt callback isn't checked in such cheap GEOS operations.
@dbaston I can work with this api :+1: many thanks!!
I could work with either. The per-thread/context callback would be cleaner though.
OK, with the current iteration you can interrupt the current thread using GEOS_interruptThread called from either a callback that executes on all threads (registered by the existing GEOS_interruptRegisterCallback) or a callback that executes on the current thread only (registered by a new GEOS_interruptRegisterThreadCallback). There are a few examples in InterruptTest.cpp and GEOSInterruptTest.cpp.
I used a new callback signature for GEOS_interruptRegisterThreadCallback, adding a void* so that an interrupt signal can be passed from a coordinating thread.
@dbaston I'm glad to see this work taking shape as it will help to address https://github.com/GEOSwift/GEOSwift/issues/190.
The proposed API lets you register a per-thread callback. I'm curious whether you've considered making it per-context rather than per-thread. My understanding is that GEOS contexts are meant to be confined to a single thread anyway, so tying the interrupt callback to the context seems like it could also solve the issue. It might also fit nicely alongside other existing context-specific APIs like the ones to register error and message handlers.
It could also be easier to use from systems where you don't select which thread to use directly (e.g. Grand Central Dispatch on Apple platforms, where you schedule work to be performed and the work is executed by one of several threads in a pool). In GEOSwift, we actually use short-lived GEOS contexts that never out-live a single call into the GEOSwift API, so if we were going to use the interrupt API in the future, tying the lifetime of the callback to the lifetime of the context would be a natural design.
That said, I think we could use the per-thread design as well by simply un-registering the handler (which I see is possible by passing NULL for the new handler) before returning from each GEOSwift API call that uses this feature.
I'm curious whether you've considered making it per-context rather than per-thread.
Yes, and that would certainly make for a cleaner API. The problem is that the "context" is a C API construct to which the C++ library (the code actually being interrupted) has no access, so we would need to introduce the notion of a context into the C++ library in such a way that any interruptible code can access its current context. I'm not sure of a simple way to do that, but I'm open to ideas.
Maybe we could have a thread-local global variable in the C++ library that stores the context, and every C API call sets that global variable before calling code in the C++ library?
The problem is that the "context" is a C API construct to which the C++ library (the code actually being interrupted) has no access,
That might have perhaps performance implications, but couldn't that interrupt handler be registered at the C context level, and each time a C _r function is invoked it would automatically install the per-thread interrupt handler before calling the C++ method, and uninstalls it at return / when an exception is caught ?
I think that would also work. Pushing the context object into the C++ library might reduce the penalty because we would be setting/checking one thread-local variable instead of two. I would need to set up a benchmark to see if that actually matters.
See #803 for an alternative API that is based on context handles.
same here. I'm assuming this should go to 3.14 so switching, @dbaston @pramsey switch back if you disagree.