pyrosm Not enough memory error when parsing pbf dataset

Hello. I was trying to parse a pbf file (approx. 65 mb) and I get an "out of memorry" error. I also have noticed that others have this issue as well. My machine specs should be sufficient for parsing this data (16gb of ram, i7 CPU),... I suppose.
I tried to investigate why this memory error occurs on small datasets and in my case it looks like it comes from node elements that are returned from the parse_osm_data (cython method) in pyrosm.py. Each node's property that is returned is an array of size x (in my case 8000). I couldn't manage to debug Cython code so I don't quite understand why this behavior happens. Is there any reason why all these elements are returned only for nodes. ?

Thank you.

size_array_pyrosm

Nov 05 '21 14:11 majkshkurti

Hmm, this sounds weird. A pbf of that size should fit to the memory if you have 16GB RAM. I assume you don't have anything else running at the same time that would consume a lot of memory?

Could you say which PBF file did you try to read, so I can check whether I can reproduce this behavior on my laptop?

Nov 05 '21 14:11 HTenkanen

The nodes, ways and relations are indeed parsed with the osm._read_pbf() method ~~but that is not typically the point when the memory is consumed the most (the numpy arrays are relatively memory-efficient)~~, well yes now looking at this, indeed this step consumes most of the memory. However, I was able to read a PBF file with size of 61 MB without problems:

Nov 05 '21 15:11 HTenkanen

Yes, I had other services running but I had available around 6-7 GB of memory and I thought this would be sufficient. This is the data: https://download.geofabrik.de/europe/germany/bayern/mittelfranken-latest.osm.pbf

Nov 05 '21 15:11 majkshkurti

Yes, unfortunately due to the logic how pyrosm now reads everything into memory at a first step, is not a good solution. There is discussion about this topic in many other issues as well. And there are plans to improve the memory handling by changing the logic how the data is read which should allow reading much bigger PBF files. However, it takes some time before getting this implemented.

Nov 05 '21 15:11 HTenkanen

@majkshkurti Which operating system you are working with? If on Unix, adding more Swap memory can help a little bit this issue: https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-20-04/

Nov 05 '21 15:11 HTenkanen

Hmm, okay there is actually a significant different between the memory usage of pyrosm v0.5.3 vs the current version. With pyrosm==0.5.3 parsing e.g. the mittelfranken PBF can be done with ease, whereas with the current release, also my computer struggles parsing the file.

There have been some changes that might explain this, such as #50 which might explain why the numpy arrays now consume much more memory than before.. Will look into this!

Update: The data type of numpy arrays was changed in v0.6.0 which has significantly increased the memory footprint.

Nov 05 '21 15:11 HTenkanen

I understand. Yes, I am using UNIX. The problem is that sometime I run this inside a docker container in which I can't allocate a lot of memory. I will switch to version 0.5.3 at the moment. I will have a look on the source code and think on how to improve the performance. Thanks a lot for the great work.

Nov 05 '21 17:11 majkshkurti

Ah, yes sure, with Docker it can indeed be tricky to do any tricks with swap memory.

I already did a bit of improvements (in #151) which helped to bring the memory footprint down a bit (closer to the level as it was with v0.5.3) . I was able to read a PBF with size of 150MB with 16GB of memory and another 16GB of Swap.

Nov 05 '21 19:11 HTenkanen

Can confirm this issue, i tried also to read the mittelfranken and niederbayern pbf files, in both cases the Kernel just died without any error message.

Nov 11 '21 23:11 Franky1

Hi @HTenkanen I think this issue has more to it than just the Memory issues. I am currently using this library in a GCP VM where i am facing ipython notebook kernel dying for the smallest of PBF files(Comores 3.3MB) even after raising the RAM size to 500 GB. As per your suggestion i used the 0.5.3 version of the library. However i can get results if i call certain functions like get_network() but when i try to do a custom filter it kills the kernel.

Feb 07 '22 11:02 calvinsac

pyrosm pyrosm copied to clipboard

Not enough memory error when parsing pbf dataset

pyrosm
pyrosm copied to clipboard