pyrosm
pyrosm copied to clipboard
ID in building tags overwriting way ID
When using get_buildings()
, the resulting id
column (usually containing the way id etc) is overwritten with the tag id
when it's present.
For example in Dorset, England there's a few cases where buildings have been tagged with id: 123
.
When you then load this data in using get_buildings()
the way ID is overwritten with the tag
from pyrosm import OSM
FILEPATH = "../data/raw/dorset-latest.osm.pbf"
osm = OSM(FILEPATH)
buildings = osm.get_buildings()
print(buildings[buildings["id"] == 123].head())
Output
start_date wikipedia id timestamp version \
181257 None None 123 1703193031 1
187693 None None 123 1704833732 1
193309 None None 123 1708889862 1
geometry tags osm_type \
181257 POLYGON ((-2.47921 50.62591, -2.47916 50.62579... None way
187693 POLYGON ((-2.46822 50.66240, -2.46839 50.66240... None way
193309 POLYGON ((-2.47700 50.62089, -2.47700 50.62078... None way
As you can see, people have tagged the buildings with duplicate IDs and these have made their way into the dataframe.
I can see that keeping an id tag was an intentional choice made in the get_osm_ways_and_relations
function of data_manager.pyx
: https://github.com/HTenkanen/pyrosm/blob/66de74bd0496d1148618842cac58923bf22d97ea/pyrosm/data_manager.pyx#L104C1-L107C63.
I was wondering whether this was the expected behaviour? As this makes it challenging to guarantee the ID is unique and from the correct OSM source.
Environment:
- OS: Windows 10
- Python package source: PyPi, pyrosm==0.6.2
- Python v3.11.0