geojson-merge icon indicating copy to clipboard operation
geojson-merge copied to clipboard

Fix stream processing

Open mkieselmann opened this issue 5 years ago • 4 comments

While trying to merge a large number of GeoJson files in streaming mode I encountered 2 bugs.

  1. The merge file did not contain all content of the separate files. This seems to be related to parallel stream processing. At least serializing the processing of the separate file streams fixes it.

  2. The contents of the first file argument were missing in the merged file. This was related to the way command line arguments are parsed. -s was parsed as key=value which leads to the fact that the first file was interpreted as value for the s option.

I've tested the changes on macOS with node v13.5.0.

mkieselmann avatar Jan 04 '20 16:01 mkieselmann

I can confirm that I also had issues while trying to merge several (30+) geojson files. The output geojson only containd a fraction of what it should have.

Furthermore I got a memory leak warning:

(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:30:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:64:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:86:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)

I tried the fixed code of this pull-request and it solved the issue.

zingi avatar Jan 23 '20 18:01 zingi

Just for the sake of good written code, you could remove the variable declaration, because the variable is actually unused:

function mergeFeatureCollectionStream (inputs) {
    const out = geojsonStream.stringify();
    const streams = inputs.map(file => fs.createReadStream(file));
    new StreamConcat(streams);
        .pipe(geojsonStream.parse())
        .pipe(out);
    return out;
}

zingi avatar Jan 24 '20 09:01 zingi

Thanks for the suggestion. I have updated the code.

mkieselmann avatar Jan 24 '20 11:01 mkieselmann

I can confirm that I also had issues while trying to merge several (30+) geojson files. The output geojson only containd a fraction of what it should have.

Furthermore I got a memory leak warning:

(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:30:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:64:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:86:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)

I tried the fixed code of this pull-request and it solved the issue.

Just for the sake of good written code, you could remove the variable declaration, because the variable is actually unused:

function mergeFeatureCollectionStream (inputs) {
    const out = geojsonStream.stringify();
    const streams = inputs.map(file => fs.createReadStream(file));
    new StreamConcat(streams);
        .pipe(geojsonStream.parse())
        .pipe(out);
    return out;
}

Same issue here with 1.2GB in 72 files. This fix cleared it right up!

kevinduke10 avatar Sep 21 '20 20:09 kevinduke10