Overpass-API icon indicating copy to clipboard operation
Overpass-API copied to clipboard

Request combining a date clause with a regex on the key

Open naomap opened this issue 7 years ago • 5 comments

I have a request that combines a regular expression on the key together with a date clause : http://overpass-turbo.eu/s/D11

It does return items that do not match the regex : these ways did not have a cycleway tag at that time. If I remove the [date:] clause the returned items do match : the tag cycleway is now available on those items. If I instead remove the ~ before "cycleway" but leave the date clause, the result is empty as expected.

However the combination of the date clause with a ~ before the key does not seem to work properly.

naomap avatar Oct 23 '18 11:10 naomap

This seems to be a bug in filter_ids_by_tags.h, probably Tag_Entry_Listener_Key_Regex.

Simplified query:

[date:"2016-01-01T00:00:00Z"];
way(333047442)[~"cycleway"~"track"];
out meta;

Situation:

  • Version 3 (NOW) has a match on cycleway = track
  • Version 1 (which is relevant for 2016-01-01T00:00:00Z) doesn't have that tag, it was only introduced 5 months later => cycleway has void_tag
  • Logic fails to remove way 333047442 from result.
Current Id: 333047442
Current_key: cycleway index.key: cycleway
Current_value: track index.value: track
Match!
Key-eval_id - id: 333047442
  eval_id: set second timestamp: NOW
  eval_id: set first timestamp 
  eval_id: current first timestamp: NOW

Current Id: 31624543
Current_key: cycleway index.key: cycleway:both
notify key: cycleway:both
commit_ids - timestamps: 1 entry - new_ids_ 0 entries
  Testing 333047442 NOW - NOW
  Adding 333047442 to new_ids


Attic Id: 333047442
Current_key: cycleway attic_index.key: cycleway
Current_value: <FF> attic_index.value: <FF>
timestamp: 2016-01-01T00:00:00Z attic ts: 2016-05-20T16:14:19Z
eval 0
Key-eval_id - id: 333047442
  eval_id: set second timestamp: 2016-05-20T16:14:19Z
  eval_id: current first timestamp: 0000-00-00T00:00:00Z (was not set, as void_tag is not relevant!)

Attic Id: 20358656
Current_key: cycleway attic index.key: cycleway:both
notify key: cycleway:both
commit_ids - timestamps:1 entry- new_ids_ 1 entry
  Testing 333047442 0000-00-00T00:00:00Z - 2016-05-20T16:14:19Z

-> 333047442 survives in new_ids_

tag_listeners: 1 entry
filter_ids 1 entry
commit_ids - timestamps:0 - new_ids_ 1 entry

In general, this seems to affect Key_Regex only, as Value_Regex doesn't show this bug, as previously noted by the reporter:

[date:"2016-01-01T00:00:00Z"];
way(333047442)["cycleway"~"track"];
out meta;

mmd-osm avatar Dec 28 '18 15:12 mmd-osm

Somehow the logic in Tag_Entry_Listener_Key_Regex in not clear. I would have assumed an algorithm, which keeps a list of "matched keys" for each Object id, along with the lowest timestamp, and an indicator, if the key referred to a "void" value at that point in time:

Node Id Matched Key Lowest Timestamp Value is Void?
43352544 cycleway 2017-06-01T00:00:00Z false
43352544 cycleway:left 2017-03-01T00:00:00Z true
43352544 cycleway:right 2017-02-01T00:00:00Z true
58235723 cycleway 2017-01-15T00:00:00Z true
58235723 cycleway:left 2017-01-11T00:00:00Z true

When filtering, only those node ids would survive that have at least one matched key with "void value" = false. In the example above, only 43352544 would be part of the result (assuming it has been in old_ids before).

mmd-osm avatar Dec 28 '18 22:12 mmd-osm

Fixed in e5c7e73b93da90bc493c8c157de66c578b2e2d9b

The logic is that filter_ids_by_tags(.. attic ..) in filter_ids_by_tags.h processes one key after another. Once a key that matches the regex is completed and the object has passed the filter it is assured that the objects is a valid result. The error comes from that the check has been separately for current and attic data, thus accepting the object prematurely because it passed based on the current data.

drolbr avatar Apr 03 '19 11:04 drolbr

Well done, thanks for the fix !

naomap avatar Apr 03 '19 12:04 naomap

There's still something strange here:

[date:"2016-01-01T00:00:00Z"];
way(161769096)[highway][~"^cycleway$"~"^track$"];
out geom meta;

returns way 161769096 in version 1. I'd expect an empty response in this case because of <tag k="cycleway" v="no"/>

cycleway=track was introduced in version 2 only, dated 2016-05-20T16:14:20Z

On the other hand, the following query returns an empty result as expected.

[date:"2016-01-01T00:00:00Z"];
way(161769096)[highway]["cycleway"~"track"];
out geom;

(Edit: fixed copy & paste error)

mmd-osm avatar Dec 11 '21 21:12 mmd-osm