OpenStreetMap.jl
OpenStreetMap.jl copied to clipboard
Performance Issues
I've found the streaming parser to be too slow, mostly because it's maintaining a complicated state machine, and generating lots of objects during the parsing.
I think it's better to just slurp in the data, first, before dealing with the logic within julia. Here's an example:
With this package:
julia> MAP_FILENAME = "tech_square.osm"
"tech_square.osm"
julia> @time nodes, hwys, builds, feats = getOSMData(MAP_FILENAME);
1.001549 seconds (340.25 k allocations: 1.888 GB, 8.55% gc time)
Under my proposal (implemented here):
julia> @time nodes1 = parseNodes(MAP_FILENAME);
0.032125 seconds (163.66 k allocations: 10.621 MB, 17.01% gc time)
julia> @time ways1 = parseWays(MAP_FILENAME);
0.022908 seconds (141.77 k allocations: 8.979 MB)
The differences become significant when processing the following osm file in julia v0.4:
Before:
julia> @time nodes, hwys, builds, feats = getOSMData(MAP_FILENAME);
2161.987085 seconds (570.21 M allocations: 3.354 TB, 13.85% gc time)
After:
julia> @time nodes = OpenStreetMapParser.parseNodes(MAP_FILENAME);
49.350339 seconds (244.94 M allocations: 14.512 GB, 9.32% gc time)
julia> @time ways = OpenStreetMapParser.parseWays(MAP_FILENAME);
50.805148 seconds (209.75 M allocations: 11.937 GB, 23.16% gc time)
I don't really know how to deal with it in this package though. I'm also leaning towards many other major changes (such as replacing Graphs with LightGraphs, modeling OSM elements more faithfully, and interoperability with DataFrames). Would you prefer for me to get the other package up to speed with most of the functionality here before dropping it in as a replacement, or to maintain a separate fork going forward?
That all sounds great to me -- I'd guess with the major switches, it could be easier to implement and benchmark them in the pared down package at first.
On Thursday, September 24, 2015, Yeesian Ng [email protected] wrote:
I don't really know how to deal with it in this package though. I'm also leaning towards many other major changes (such as replacing Graphs with LightGraphs, modeling OSM elements more faithfully, and interoperability with DataFrames). Would you be interested for me to get the other package https://github.com/yeesian/OpenStreetMapParser.jl up to speed with most of the functionality here before dropping it in as a replacement, or to maintain a separate fork going forward?
— Reply to this email directly or view it on GitHub https://github.com/tedsteiner/OpenStreetMap.jl/issues/72#issuecomment-143009601 .
I feel like I missed something big here. Are you planning to replace this package entirely?
If there is a faster parser available, why not just incorporate it here instead? I can't see anyone arguing against a faster parser that just reads the nodes or ways. The same goes for Graphs vs. LightGraphs -- I think everyone's only preference would be towards something that works well.
I guess my question is, why do you need to fork or start fresh?
I'm on the road, so I haven't followed the link, but it sounded to me like a set of refactorings that would be easier to do (and Api easier to experiment with) with a small core, and this is a bit of a sprawling package. I assumed that would be temporary. Aside: the one item I still wouldn't be in favor of is having data frames interop in the core openstreetmap package.
On Thursday, September 24, 2015, Ted Steiner [email protected] wrote:
I feel like I missed something big here. Are you planning to replace this package entirely?
If there is a faster parser available, why not just incorporate it here instead? I can't see anyone arguing against a faster parser that just reads the nodes or ways. The same goes for Graphs vs. LightGraphs -- I think everyone's only preference would be towards something that works well.
I guess my question is, why do you need to fork or start fresh?
— Reply to this email directly or view it on GitHub https://github.com/tedsteiner/OpenStreetMap.jl/issues/72#issuecomment-143049494 .
Sorry I was in a seminar.
No, I don't have any intentions of replacing this package by registering a different package in METADATA. Any changes I'd like to see will still come in incremental PRs to this package.
But I did need to start afresh, for the reasons sean mentioned: to experiment with a smaller core, and see how changes (that are beyond the scope of a PR) would work out in practice. It will be really draining to do multiple major-but-incremental PRs (I have tried) that change both the parsers and the object models simultaneously, while (i) making sure everything else works, (ii) demonstrating why the changes might be necessary, and (iii) maintaining a tidy commit history.
Aside: the one item I still wouldn't be in favor of is having data frames interop in the core openstreetmap package.
There won't be any dataframes operations in the core openstreetmap package. The OSM package/objects should be designed only to reflect the nature of OSM objects, rather than providing sophisticated data operations. The allusion to dataframes is only to allow users to export/convert OSM objects to dataframes (for functionality that OSM will not provide) if they so desire.
Oh, I misread. That makes sense.
On Thursday, September 24, 2015, Yeesian Ng [email protected] wrote:
Aside: the one item I still wouldn't be in favor of is having data frames interop in the core openstreetmap package.
There wouldn't be any dataframes operations in the core openstreetmap package. The OSM package/objects should be designed only to reflect the nature of OSM objects, rather than providing sophisticated data operations. The allusion to dataframes is to allow users to export/convert OSM objects to dataframes (for functionality that OSM will not provide) if they so desire.
— Reply to this email directly or view it on GitHub https://github.com/tedsteiner/OpenStreetMap.jl/issues/72#issuecomment-143067837 .
Cool, I'm all for it. Thanks for explaining, @yeesian! I was just a little thrown off by some of the text in the scope section your readme. If there's anything I can do to help make your life easier just let me know!
I'm back in front of a computer, and after reading @yeesian's readme, I agree that the package structure he outlines is what we should have in the long term.
Since I think @tedsteiner wants this package to remain a one-stop shop, which makes sense to have, I think that, once the design (apis and boundaries) settles (long term), making those individual packages offshoots / dependencies of this package, and registering those individual packages (for those that aren't served by always having graphing + plotting + ... dependencies in production) is the right goal.
Sounds good to me.
Let me know how I can help with LightGraphs. Specifically, if there's functionality you need that doesn't exist, or you have questions relating to existing functionality, please feel free to open up an issue there so we can discuss the best way to proceed.
So I've got stuff working (with LightGraphs) and running on OpenStreetMapParser, and am using that for my research at the moment. I'm really busy at the moment, and won't have time to start doing PRs to this package for awhile though