WIP: Add osmium support for handling different kind of files
Still a work in progress and would kindly appreciate some review/feedback since some namings are not so clear. Closes: https://github.com/ad-freiburg/pfaedle/issues/10
Thank you very much for your work! I will look over the code in the next days. How did you change the general workflow in OsmBuilder? Have you run any tests regarding memory consumption and parsing times? Is XML parsing now faster or slower than before?
In general, I am still a bit hesitant to use libosmium here. It's a huge additional dependency. In particular, it introduces Boost as a dependency, which I would like to avoid. If the main goal is to support .pbf files, I still think it would be a better approach to just parse the .pbf files directly. But maybe I am wrong :)
Currently this is still a WIP so currently what is done only reading the data through the libosmium. How did you change the general workflow in OsmBuilder?
- didn't changed anything regarding the processing flow.
Have you run any tests regarding memory consumption and parsing times?
- not yet
Is XML parsing now faster or slower than before?
- don't know yet
The idea was to keep the application "logic" as you have written it since there is still time required for me to understand in detail what is done there.
Also please don't hesitate to:
- [ ] ask questions regarding why something is done in one way or another
- [ ] suggest improvements to code structure (will try to move some things around and make some things more generic but first I wanted to make things work)
Honestly might be good to:
- [ ] better document the classes so that any other person who want's to contribute to get around quickly. And by documenting I suggest using doxygen or something like that => maybe this can be turned into an issue
- [ ] better document the workflow used
This is a very practical application that ads a huge benefit for processing gtfs data for agencies that do no generate their shapes for GTFS. Thanks for developing it will try to contribute as much as I can.
One more note: the pull request also contains some clang code improvement suggestions.
@patrickbr I'm curious what's holding this PR back? Is it that you didn't have time/energy/motivation to review this yet, or is it the general direction (e.g. the Boost dependency) that you're unhappy with?
I'm currently map-matching many GTFS feeds using pfaedle (thanks for this tool btw!), and it has to re-read a 12gb OSM XML file for every GTFS feed. I hope that reading ~700mb of .pbf would be faster.
I also noticed that pfaedle seems to read this file multiple times, once per matching iteration. In my case, it reads & parses the 12gb de-bw-buffered.osm file three times. Within Docker for macOS on my old laptop, each read takes ~15min.