pyrosm icon indicating copy to clipboard operation
pyrosm copied to clipboard

pygeos geometry arrays

Open Padarn opened this issue 4 years ago • 2 comments

I was reading the documentation here: https://pyrosm.readthedocs.io/en/latest/benchmarking.html, and specifically the comment on memory usage

The most memory consuming part currently is constructing Shapely geometries into GeoDataFrame. There might be improvements coming on this once Geopandas starts to support Pygeos geometry arrays.

I'm interested in this, I'm hitting memory limits with some large files. Do you have any further information on the support for this in Geopandas, and/or the plans for support in pyrosm.

Possible I can help contribute if there is a need.

Padarn avatar Oct 01 '20 04:10 Padarn

Hi @Padarn and thanks for your message!

Indeed, there is some room for improvement in terms of memory consumption. I have not tested yet, how much using the pygeos geometry array could actually help in terms of memory consumption (I assume it does help a bit). Basically this step where geometries are converted into Shapely geometries could be skipped when Pygeos with Geopandas 0.8< is available for the user: https://github.com/HTenkanen/pyrosm/blob/master/pyrosm/geometry.pyx#L73-L76

If you have time/interest to take a look at this, I'm very happy to continue discussing about this via a PR. 👍 This is anyway one of those areas that I'd like to improve in pyrosm in the future. Ideally there should be a way to handle very large datasets e.g. in batches, or support something like Vaex and out-of-core DataFrames (just some initial ideas).

HTenkanen avatar Oct 11 '20 08:10 HTenkanen

Hey @HTenkanen. Yeah certainly have some interest.. but might take me a bit of time to get going. Will have a go at this over the next week. Thanks.

Padarn avatar Oct 16 '20 11:10 Padarn