pyosmium
pyosmium copied to clipboard
Simplify handler interface
The current concepts of handler classes mimics closely the C++ interface of osmium. There is no need to do this. pyosmium should get a simpler interface that is more in the spirit of python. I'm thinking about something along the way:
import osmium
@osmium.node
def do_stuff(node):
pass
@osmium.way(locations=true)
def do_other_stuff(obj):
pass
@osmium.node
@osmium.way
@osmium.relation
def do_this_for_all_objects(obj):
pass
osmium.apply_file("foo.pdf", do_stuff, do_other_stuff, do_this_for_all_objects)
Looks good to me, just a few comments:
- I'm not sure if functions should be provided as multiple arguments or as one argument (tuple, list etc.). Making all (or almost all) other arguments as keyword only should secure easy future extensions
And I'm thinking, that we could extend the list of decorators to include:
- create copies of objects for those, who are working with sets of data
- input sorting (if locations are required)
- define behaviour on invalid locations (skip, warn or error)
And probably, I'd made @osmium.locations
as separate decorator, which can take index types as arguments.
That's some good comments. Thoughts on that:
- The thought about the multiple arguments was mainly that it saves two more characters. I'm not sure if there is any advantage to one or the other, they are easily convertible.
- I have strong opinions about providing copies, see #125. So, that's a firm no.
- Input sorting is more of a function of the input file, so maybe something like:
osmium.apply(osmium.sorted("foo.pbf"), my_handler)
- Invalid locations and index types are a good points but they are properties of the location handler which is global to the apply function. So we'd have to add the parameters there: `osmium.apply("foo.pbf", my_handler, location_index="sparse_array", location_ignore_errors=False)
In the spirit of this, +1 to the existing behavior of using callbacks as being not-Pythonic.
One very visible issue is that, as it stands now, the library can't really be used from a REPL/IPython/Jupyter notebook... Which is a real bummer for a couple of reasons:
- Using <your favorite REPL/Jupyter> provides a great prototyping/testing environment
- Jupyter notebooks are heavily used in data science applications
- The structure of using callbacks in the way that PyOsmium does is pretty foreign to a Python developer.... I'm sure it makes perfect sense coming from libosmium but that structure doesn't translate into something Pythonic.
Yes, technically you can write your handler class in a Jupyter notebook and then use it, but it's really atypical... Being able to use the library procedurally is important.
Ultimately, part of a goal of any Python library/binding for a library written in C/C++ is not just being able to access the functionality provided by the library but also to be able to utilize the convenience/ease of Python to do that. (The reason Python has taken off as a major language in data science and other fields is definitely not because Python itself is fast -- it isn't. But it's comparatively easy to write native code [especially with things like Cython], compile it, and run complex stuff from Python at near-native speed (aka nearly the speed of pure C/C++).... )
Given that OSM data can be utilized heavily in data science, analysis, GIS applications, etc, I think it's really important to take a step back and make sure that some clear use cases are defined. Opening an OSM PBF dump/extract in a Jupyter notebook and pulling data for analysis is one such use case.
Went a bit of a different way. We have now:
- support for arbitrary number of handlers in apply() (#241)
- any class can be used as a handler (#242, #243)
- an iterative interface, so that a simple for loop can be used(#245)
This should simplify the code enough even without decorators.