pyrosm icon indicating copy to clipboard operation
pyrosm copied to clipboard

Out of memory executing new york data example

Open gegen07 opened this issue 3 years ago • 3 comments

Hey, I was trying to execute the example using the new york dataset as written in documentation, but I run out of memory.

Config:

  • Colab Pro
  • RAM: 25 GB

It seems weird behavior as the doc state that 16GB of RAM would be ok to parse New York data.

gegen07 avatar Oct 29 '21 23:10 gegen07

Hi @gegen07,

a good point, thanks for raising this up! The most likely reason for this is the fact that the size of the data dump for New York state is approx. 1.8 times larger compared to the one that it used to be back when the benchmarks were run.

But overall the memory management is an issue which will be tackled in the future. The goal is to change the internal logic in such a way that it would be possible to parse larger-than-memory sized dumps (based on Apache Arrow and Vaex).

HTenkanen avatar Oct 30 '21 09:10 HTenkanen

@gegen07 : Also found a "bug" which has unnecessarily increased the memory footprint of numpy arrays, see #150.

HTenkanen avatar Nov 05 '21 16:11 HTenkanen

@gegen07 : Also found a "bug" which has unnecessarily increased the memory footprint of numpy arrays, see #150.

Sorry for the late response. With the updates done I've already seen some improvements in memory consumption, but it still runs out of memory.

But overall the memory management is an issue which will be tackled in the future. The goal is to change the internal logic in such a way that it would be possible to parse larger-than-memory sized dumps (based on Apache Arrow and Vaex).

About this issue, would you want any help? I'm more than happy to help you tackle this issue. I think there's a lot to learn with this environment.

gegen07 avatar Jan 05 '22 20:01 gegen07