mosaic
mosaic copied to clipboard
Linear feature matching/querying/joining
I work for a state road authority and much of our data is held on linestring geometries and these geometries 'shift' over time and just vary between data providers. HERE lines (arcs) aren't the same as OSM or Tomtom and the matching and querying of attributes across the datasets is difficult.
Being spatial I deal with pictures better. If we locate features on the green line but then we need to query the red line... and attribute the extent of the blue line against the red line.

This is essentially our current practise

Whereas I think we should be doing something like ...

The loop on the right is where I envisage Databricks would be utilised.
I will understand if you if this is not on the current road map, as when we talk to people like ESRI they'll say "We'll get back to you" and we never hear from them again...
Line on line is one of, if not the most difficult spatial problems.
Here is your mission....
@Bandit253 thank you for reporting the issue/request for feature. Would you be open to have a chat on top of this? I would like to discuss this in a bit more details. I am happy to organize something at a time friendly to you. Could you drop me an email at [email protected]? When we identify a clear scope we can work on either a feature request or a solution accelerator built using Mosaic, which ever makes more sense.
Thanks, @milos-colic for your response I'll send an email to discuss.
A potentially useful link, this can be used as basis of matching arc (lines) https://www.microsoft.com/en-us/research/publication/hidden-markov-map-matching-noise-sparseness/
I do not mean to hijack this issue, but given that you linked a MM'ing resource, I would be keen to hear more about potential solutions you proposed for this problem @milos-colic ? Can also reach out via email if that is preferred.
We have some ideas around operating in vector-tiles space, i.e. local geometries after they are tessellated to a grid index system. We can operate then on Jaccardian like distance to produce candidates that should be ordered based on a specified spatial distance (this could be a minimal distance, this could be a dynamic time warping distance if we decide a line feature is similar to what a time series data is in another domain, really we can expose several strategies there to understand what "near" means and what "nearest" means). We know it is not an straight one answer, hence we are trying to package up a technique that is fairly customisable to avoid narrowing down too much. Anything that is specific use case I would say email is better since this channel is public. Anything that is generic and can be discussed on a level of I need to match multiple layers of geo data that may have different precision and joins arent resolving well, can be kept here.
One thing that I would add to what I said above, we also have several options about what the output is. Is it the most central feature from a feature candidate set (for knowing what the most central one is we need the distance strategy), or is it a some sort o aliasing between the feature candidate set. Both have pros and cons, and any opinion on these would be useful so we can make the best choices.
Absolutely correct the analogy with time is perfect. I have compiled a set of layers/features to demonstrate the use case. I will forward via email.
Hi Milos, Rapt to see some activity on the issue.
Rob
On Fri, Oct 7, 2022 at 1:54 AM Milos Colic @.***> wrote:
One thing that I would add to what I said above, we also have several options about what the output is. Is it the most central feature from a feature candidate set (for knowing what the most central one is we need the distance strategy), or is it a some sort o aliasing between the feature candidate set. Both have pros and cons, and any opinion on these would be useful so we can make the best choices.
— Reply to this email directly, view it on GitHub https://github.com/databrickslabs/mosaic/issues/146#issuecomment-1270205024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEWB7MJJ33MVBY6R77NB2LWB3RYVANCNFSM52LEZYIA . You are receiving this because you were mentioned.Message ID: @.***>