pyrosm icon indicating copy to clipboard operation
pyrosm copied to clipboard

custom_filter get_pois not working with value True

Open mattijsdp opened this issue 1 year ago • 6 comments

Describe the bug The custom_filter does not work as one would expect when using the catch-all 'True' as a value in combination with another tag key. If I understand the documentation correctly, then this should return all osm records with that tag key and this return more entities compared to when one specifies a tag key and value.

To Reproduce

from pyrosm import OSM, get_data
import shapely
import h3

# Some h3 cell in New York
bb = shapely.Polygon(h3.h3_to_geo_boundary("882a107749fffff", geo_json=True))

state = "New York"
fp = get_data("New York")
osm = OSM(fp, bounding_box=bb)

# Get all buildings and "railway": "station"
pois = osm.get_pois(custom_filter={"building": True, "railway": ["station"]})
print(len(pois)) # 4

# Get all "building": "yes" and "railway": "station"
pois = osm.get_pois(custom_filter={"building": ["yes"], "railway": ["station"]})
print(len(pois)) # 2152

Expected behavior Using True should always lead to a geq number of records than specifying some tag value.

Environment:

  • OS: Linux
  • Python package source (PyPi, conda, ...): pyrosm, 0.6.2, py311hb755f60_1, conda-forge
  • Versions of Python, Java Development Kit, Python modules: 3.11.6

Additional context I realise there is a function called get_buildings() which would be more suited if you are only interested in buildings. I am interested in POIs but some buildings I would also like to include. In general, I am using a somewhat involved custom_filter with True and this example shows that I either don't understand the implementation or there is a bug.

Great work on the package though, very useful!!

mattijsdp avatar Dec 12 '23 12:12 mattijsdp

I believe it is because in the line below, the tag values of True aren't added to filter_keys nor data_filter. If all tag values are True then this is fine as data_filter is None is handled below that but not when it is a combination. https://github.com/HTenkanen/pyrosm/blob/66de74bd0496d1148618842cac58923bf22d97ea/pyrosm/data_filter.pyx#L112C13-L112C13

mattijsdp avatar Dec 12 '23 12:12 mattijsdp

I see that overlapping_filter also plays a part. I think this should be set to True if there are any tags that are set to value True (catch-all) so that the for loop here is never broken early in these cases.

mattijsdp avatar Dec 12 '23 14:12 mattijsdp

I'm getting odd results with using the custom_filter for the get_boundaries method too.

Filtering with an osm id, yields more results than the same request without a custom filter.

my_filter = {'id':['639343']}
admin_boundaries = osm_data_obj.get_boundaries(custom_filter = my_filter)

yields 378 results that do not have the id '639343'. Without a custom filter there are only 245 results.

Similar things are true with filters: {'name':['Regierungsbezirk Karlsruhe']} yields 245 results, includes only 1 with that exact name {'admin_level':['6']} yields 245 results as well. includes only 10 with that admin_level

using a custom filter for the get_data_by_custom_criteria method seems to work as expected however.

Chwiggy avatar Dec 15 '23 15:12 Chwiggy

@Chwiggy could you share the bounding box or area you're using?

mattijsdp avatar Dec 21 '23 09:12 mattijsdp

This is the geometry i was using:

{ "type": "Feature",
    "properties": { "name": "2023_rnv_gtfs.zip", "path": "filepath" },
    "geometry": { 
         "type": "Polygon",
         "coordinates": [ [ [ 8.1704019, 49.3432631 ], [ 8.811395, 49.3432631 ], [ 8.811395, 49.5811911 ], [ 8.1704019, 49.5811911 ], [ 8.1704019, 49.3432631 ] ] ]
    } 
}

the osm.pbf file was cropped to this bounding box from a larger file with osmosis with completeWays=yes

Chwiggy avatar Dec 21 '23 12:12 Chwiggy

@Chwiggy please see my MR for a fix.

Your use case works as expected though. It is just because get_boundaries adds additional tag key:value combinations as can be seen here. If the boundary tag key isn't specified then it is added as True thereby including all osm entities with the tag boundary

mattijsdp avatar Jan 16 '24 17:01 mattijsdp