pyrosm
pyrosm copied to clipboard
pygeos geometry arrays
I was reading the documentation here: https://pyrosm.readthedocs.io/en/latest/benchmarking.html, and specifically the comment on memory usage
The most memory consuming part currently is constructing Shapely geometries into GeoDataFrame. There might be improvements coming on this once Geopandas starts to support Pygeos geometry arrays.
I'm interested in this, I'm hitting memory limits with some large files. Do you have any further information on the support for this in Geopandas, and/or the plans for support in pyrosm.
Possible I can help contribute if there is a need.
Hi @Padarn and thanks for your message!
Indeed, there is some room for improvement in terms of memory consumption. I have not tested yet, how much using the pygeos geometry array could actually help in terms of memory consumption (I assume it does help a bit). Basically this step where geometries are converted into Shapely geometries could be skipped when Pygeos with Geopandas 0.8< is available for the user: https://github.com/HTenkanen/pyrosm/blob/master/pyrosm/geometry.pyx#L73-L76
If you have time/interest to take a look at this, I'm very happy to continue discussing about this via a PR. 👍 This is anyway one of those areas that I'd like to improve in pyrosm in the future. Ideally there should be a way to handle very large datasets e.g. in batches, or support something like Vaex and out-of-core DataFrames (just some initial ideas).
Hey @HTenkanen. Yeah certainly have some interest.. but might take me a bit of time to get going. Will have a go at this over the next week. Thanks.