pyrosm
pyrosm copied to clipboard
Support OSM history files (osh.pbf)
OSH.PBF contains full OSM history (metadata). It would be very useful for specific purposes to have the ability to read/understand how the OSM features have changed.
Notes:
- Downloading OSH.PBF from Geofabrik requires authenticating to Geofabrik with OSM account. Hence, this feature would work only with files that are downloaded manually by the user (no authentication functionality is planned atm for pyrosm).
- Good memory management is required
Todo:
- [x] Allow reading "HistoricalInformation" feature-type
- [x] Develop supporting methods to handle historical data (versions)
- [ ] Add documentation
Initial ideas how the historical data could be managed:
Keeping track of the changes based on different versions:
- Use pandas MultiIndex to keep track of different versions of the same feature.
- The newest version of the feature would be listed as the first row for given group of features?
- Pros: Would make it possible to easily extract/see specific version of the same feature
Keeping track of the changes based on timestamp:
- Create a DatetimeIndex based on the timestamp of the feature
- Pros: Would allow to easily slice the OSM data into temporal snapshots
Should both of these functionalities be implemented? What kind of use-cases / needs there could be for dealing with historical OSM data?
I was using the time-based snapshot to ensure the reproducibility of code that fetches data from OSM - https://github.com/martinfleis/evolution-gean/blob/main/01_morphometrics.ipynb. So I vote to include the timestamp-based extraction as I would have a use case for it myself :).
Tracking the changes of an individual feature may also be interesting, esp. for an analysis of the changes of OSM precision.
So yeah, I can imagine that both options you mention may be useful.
Thanks @martinfleis for the comment!
I also think that the timestamp-based extraction most likely would have many potential applications, and it would likely be quite intuitive to use as well. 👍🏻 Timestamp indexing could also make it relatively easy to create something like an interactive time-slider functionality built on top of ipyleaflet or hvplot. 🤔
Timestamp-based approach also seem to be used in other implementations, such as ohsome-py: https://heigit.org/ohsome-py-python-package-for-osm-history-analytics-published/
@martinfleis : In the master branch, there is now the functionality to extract OSM layers from OSH.PBF based on given timestamp (see the API reference). I also included a small sample dataset that can be used to test out the feature. Basic workflow goes something like this:
from pyrosm import OSM, get_data
timestamp = "2015-01-01 12:00"
osm = OSM(get_data("helsinki_history_pbf"))
network = osm.get_network(network_type="driving", timestamp=timestamp)
buildings = osm.get_network(network_type="driving", timestamp=timestamp)
Would you like to give it a test run? 🙂
Would you like to give it a test run?
It feels a bit slow but apart from that, it works like a charm! Thanks!