Request combining a date clause with a regex on the key
I have a request that combines a regular expression on the key together with a date clause : http://overpass-turbo.eu/s/D11
It does return items that do not match the regex : these ways did not have a cycleway tag at that time. If I remove the [date:] clause the returned items do match : the tag cycleway is now available on those items. If I instead remove the ~ before "cycleway" but leave the date clause, the result is empty as expected.
However the combination of the date clause with a ~ before the key does not seem to work properly.
This seems to be a bug in filter_ids_by_tags.h, probably Tag_Entry_Listener_Key_Regex.
Simplified query:
[date:"2016-01-01T00:00:00Z"];
way(333047442)[~"cycleway"~"track"];
out meta;
Situation:
- Version 3 (NOW) has a match on cycleway = track
- Version 1 (which is relevant for 2016-01-01T00:00:00Z) doesn't have that tag, it was only introduced 5 months later => cycleway has
void_tag - Logic fails to remove way 333047442 from result.
Current Id: 333047442
Current_key: cycleway index.key: cycleway
Current_value: track index.value: track
Match!
Key-eval_id - id: 333047442
eval_id: set second timestamp: NOW
eval_id: set first timestamp
eval_id: current first timestamp: NOW
Current Id: 31624543
Current_key: cycleway index.key: cycleway:both
notify key: cycleway:both
commit_ids - timestamps: 1 entry - new_ids_ 0 entries
Testing 333047442 NOW - NOW
Adding 333047442 to new_ids
Attic Id: 333047442
Current_key: cycleway attic_index.key: cycleway
Current_value: <FF> attic_index.value: <FF>
timestamp: 2016-01-01T00:00:00Z attic ts: 2016-05-20T16:14:19Z
eval 0
Key-eval_id - id: 333047442
eval_id: set second timestamp: 2016-05-20T16:14:19Z
eval_id: current first timestamp: 0000-00-00T00:00:00Z (was not set, as void_tag is not relevant!)
Attic Id: 20358656
Current_key: cycleway attic index.key: cycleway:both
notify key: cycleway:both
commit_ids - timestamps:1 entry- new_ids_ 1 entry
Testing 333047442 0000-00-00T00:00:00Z - 2016-05-20T16:14:19Z
-> 333047442 survives in new_ids_
tag_listeners: 1 entry
filter_ids 1 entry
commit_ids - timestamps:0 - new_ids_ 1 entry
In general, this seems to affect Key_Regex only, as Value_Regex doesn't show this bug, as previously noted by the reporter:
[date:"2016-01-01T00:00:00Z"];
way(333047442)["cycleway"~"track"];
out meta;
Somehow the logic in Tag_Entry_Listener_Key_Regex in not clear. I would have assumed an algorithm, which keeps a list of "matched keys" for each Object id, along with the lowest timestamp, and an indicator, if the key referred to a "void" value at that point in time:
| Node Id | Matched Key | Lowest Timestamp | Value is Void? |
|---|---|---|---|
| 43352544 | cycleway | 2017-06-01T00:00:00Z | false |
| 43352544 | cycleway:left | 2017-03-01T00:00:00Z | true |
| 43352544 | cycleway:right | 2017-02-01T00:00:00Z | true |
| 58235723 | cycleway | 2017-01-15T00:00:00Z | true |
| 58235723 | cycleway:left | 2017-01-11T00:00:00Z | true |
When filtering, only those node ids would survive that have at least one matched key with "void value" = false. In the example above, only 43352544 would be part of the result (assuming it has been in old_ids before).
Fixed in e5c7e73b93da90bc493c8c157de66c578b2e2d9b
The logic is that filter_ids_by_tags(.. attic ..) in filter_ids_by_tags.h processes one key after another. Once a key that matches the regex is completed and the object has passed the filter it is assured that the objects is a valid result. The error comes from that the check has been separately for current and attic data, thus accepting the object prematurely because it passed based on the current data.
Well done, thanks for the fix !
There's still something strange here:
[date:"2016-01-01T00:00:00Z"];
way(161769096)[highway][~"^cycleway$"~"^track$"];
out geom meta;
returns way 161769096 in version 1. I'd expect an empty response in this case because of <tag k="cycleway" v="no"/>
cycleway=track was introduced in version 2 only, dated 2016-05-20T16:14:20Z
On the other hand, the following query returns an empty result as expected.
[date:"2016-01-01T00:00:00Z"];
way(161769096)[highway]["cycleway"~"track"];
out geom;
(Edit: fixed copy & paste error)