pyrosm
pyrosm copied to clipboard
custom_filter get_pois not working with value True
Describe the bug The custom_filter does not work as one would expect when using the catch-all 'True' as a value in combination with another tag key. If I understand the documentation correctly, then this should return all osm records with that tag key and this return more entities compared to when one specifies a tag key and value.
To Reproduce
from pyrosm import OSM, get_data
import shapely
import h3
# Some h3 cell in New York
bb = shapely.Polygon(h3.h3_to_geo_boundary("882a107749fffff", geo_json=True))
state = "New York"
fp = get_data("New York")
osm = OSM(fp, bounding_box=bb)
# Get all buildings and "railway": "station"
pois = osm.get_pois(custom_filter={"building": True, "railway": ["station"]})
print(len(pois)) # 4
# Get all "building": "yes" and "railway": "station"
pois = osm.get_pois(custom_filter={"building": ["yes"], "railway": ["station"]})
print(len(pois)) # 2152
Expected behavior Using True should always lead to a geq number of records than specifying some tag value.
Environment:
- OS: Linux
- Python package source (PyPi, conda, ...):
pyrosm, 0.6.2, py311hb755f60_1, conda-forge
- Versions of Python, Java Development Kit, Python modules: 3.11.6
Additional context
I realise there is a function called get_buildings()
which would be more suited if you are only interested in buildings. I am interested in POIs but some buildings I would also like to include. In general, I am using a somewhat involved custom_filter
with True
and this example shows that I either don't understand the implementation or there is a bug.
Great work on the package though, very useful!!
I believe it is because in the line below, the tag values of True
aren't added to filter_keys
nor data_filter
. If all tag values are True then this is fine as data_filter is None
is handled below that but not when it is a combination.
https://github.com/HTenkanen/pyrosm/blob/66de74bd0496d1148618842cac58923bf22d97ea/pyrosm/data_filter.pyx#L112C13-L112C13
I see that overlapping_filter
also plays a part. I think this should be set to True if there are any tags that are set to value True (catch-all) so that the for loop here is never broken early in these cases.
I'm getting odd results with using the custom_filter
for the get_boundaries
method too.
Filtering with an osm id, yields more results than the same request without a custom filter.
my_filter = {'id':['639343']}
admin_boundaries = osm_data_obj.get_boundaries(custom_filter = my_filter)
yields 378 results that do not have the id '639343'. Without a custom filter there are only 245 results.
Similar things are true with filters:
{'name':['Regierungsbezirk Karlsruhe']}
yields 245 results, includes only 1 with that exact name
{'admin_level':['6']}
yields 245 results as well. includes only 10 with that admin_level
using a custom filter for the get_data_by_custom_criteria
method seems to work as expected however.
@Chwiggy could you share the bounding box or area you're using?
This is the geometry i was using:
{ "type": "Feature",
"properties": { "name": "2023_rnv_gtfs.zip", "path": "filepath" },
"geometry": {
"type": "Polygon",
"coordinates": [ [ [ 8.1704019, 49.3432631 ], [ 8.811395, 49.3432631 ], [ 8.811395, 49.5811911 ], [ 8.1704019, 49.5811911 ], [ 8.1704019, 49.3432631 ] ] ]
}
}
the osm.pbf file was cropped to this bounding box from a larger file with osmosis with completeWays=yes
@Chwiggy please see my MR for a fix.
Your use case works as expected though. It is just because get_boundaries
adds additional tag key:value combinations as can be seen here. If the boundary
tag key isn't specified then it is added as True
thereby including all osm entities with the tag boundary