pyrosm icon indicating copy to clipboard operation
pyrosm copied to clipboard

ID in building tags overwriting way ID

Open AnBowell opened this issue 10 months ago • 1 comments

When using get_buildings(), the resulting id column (usually containing the way id etc) is overwritten with the tag id when it's present.

For example in Dorset, England there's a few cases where buildings have been tagged with id: 123.

image

When you then load this data in using get_buildings() the way ID is overwritten with the tag

from pyrosm import OSM
FILEPATH = "../data/raw/dorset-latest.osm.pbf"
osm = OSM(FILEPATH)
buildings = osm.get_buildings()

print(buildings[buildings["id"] == 123].head())

Output

       start_date wikipedia   id   timestamp version  \
181257       None      None  123  1703193031       1   
187693       None      None  123  1704833732       1   
193309       None      None  123  1708889862       1   

                                                 geometry  tags osm_type  \
181257  POLYGON ((-2.47921 50.62591, -2.47916 50.62579...  None      way   
187693  POLYGON ((-2.46822 50.66240, -2.46839 50.66240...  None      way   
193309  POLYGON ((-2.47700 50.62089, -2.47700 50.62078...  None      way   

As you can see, people have tagged the buildings with duplicate IDs and these have made their way into the dataframe.

I can see that keeping an id tag was an intentional choice made in the get_osm_ways_and_relations function of data_manager.pyx: https://github.com/HTenkanen/pyrosm/blob/66de74bd0496d1148618842cac58923bf22d97ea/pyrosm/data_manager.pyx#L104C1-L107C63.

I was wondering whether this was the expected behaviour? As this makes it challenging to guarantee the ID is unique and from the correct OSM source.

Environment:

  • OS: Windows 10
  • Python package source: PyPi, pyrosm==0.6.2
  • Python v3.11.0

AnBowell avatar Apr 09 '24 14:04 AnBowell