Low level GeoJson implementation
This is a request for either a rewrite or alternative implementation of the current GeoJson module found here. We need a high performance implementation that has a minimal memory footprint. The reasoning behind that is that geojson models should be scalable (eg. downstream we query the map for 1000 of features, the path currently taken to create those and the intermediate objects created for doing that isn't optimal atm).
Couple of thoughts to improve this:
- use primitive instead of objects where possible
- no need for AutoValue, implementation is based on a specification that doesn't change (often)
- if geojson models are immutable, use arrays instead of List
- limit method invocations & don't allocate memory if you can avoid it
- don't over abstract concepts if they can be expressed more simple
More information on performance tips here RFC of GeoJSON can be found here.
cc @mapbox/android
Thank you @tobrun for framing the issue and proposing some solutions. By looking at your list, looks like at list the following items would change the public API of the library:
use primitive instead of objects where possible no need for AutoValue, implementation is based on a specification that doesn't change (often) if geojson models are immutable, use arrays instead of List
Am I reading it right? If so, we need to 1) either wait until the next MAS SEMVER release (4.x) or 2) develop this new library as a new product with a different namespace.
Notes in OP were mostly about the internal management so probably we can keep the same public api though this is something that needs to be determined when scoping out. As first steps I would recommend writing up a test suite to tracks memory allocation / cpu usage for 1 feature up to a million features and compare that with the 2.2.9 version. After, we can scoop out what the impact would be to bring the notes from OP into play and use that same test suite as reference during implementation.
I did a bit of testing on how much memory Point takes in 2.x.x based code vs 3.0 (with AutoValue).
| Number of points | size in 2.x.x mapbox-java | size in 3.0 mapbox.java |
|---|---|---|
| 1 | 48 | 76 |
| 100 | 4800 | 7600 |
| 10000 | 48000 | 76000 |
Also tried comparing loading of sample json with FeatureCollection:
{ "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": { "type": "Point", "coordinates": [ 102, 0.5 ] }, "properties": { "prop0": "value0" } }, { "type": "Feature", "geometry": { "type": "LineString", "coordinates": [ [ 102, 0 ], [ 103, 1 ], [ 104, 0 ], [ 105, 1 ] ] }, "properties": { "prop0": "value0", "prop1": 0 } }, { "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [ [ 100, 0 ], [ 101, 0 ], [ 101, 1 ], [ 100, 1 ], [ 100, 0 ] ] ] }, "properties": { "prop0": "value0", "prop1": { "this": "that" } } } ] }
Here is Heap dump for 2.0 and 3.0 based mapbox-java

@osana Thanks for kicking off this work - what's your take on the results in https://github.com/mapbox/mapbox-java/issues/710#issuecomment-389204544 and https://github.com/mapbox/mapbox-java/issues/710#issuecomment-389222070 and what are the next steps?
Is this still in progress?
@zugaldia @tobrun I think the next steps could be as @tobrun points out
- remove AutoValue from geoJson as there is no real need
- If we want to go further, we can modify api for coordinates - that would be a semver-major change though. At the moment we have List<Double> for Point and List<Point>, List<List<Point>> for other Geometries. We can go back to double[] or something else. Ideally the solution should be immutable, which is not the case with either double[] or List<Double>
Thank you @osana. Let's start by having an implementation that doesn't rely on AutoValue to avoid the overhead. This could be based of the MAS v2.x code to speed things up. If SEMVER is a concern, let's build the prototype under a new namespace (e.g. com.mapbox.geojson2) in the same module and package. This comes with the advantage of being able to measure performance of both implementations side by side.
@zugaldia good idea to look at v2.x code and I agree that having two packages next to each other is a good way to compare.
Capturing from @mordag in https://github.com/mapbox/mapbox-gl-native/issues/14555 to look into replacing GSON with Jackson as parser which makes a lot of sense as our data model doesn't change (the geojson RFC makes sure of that).
Android JSON Parsers Comparison: https://medium.com/@IlyaEremin/android-json-parsers-comparison-2017-8b5221721e31
Looks like Jackson would perform best for our needs.
This issue has been automatically detected as stale because it has not had recent activity and will be archived. Thank you for your contributions.
This issue has been automatically detected as stale because it has not had recent activity and will be archived. Thank you for your contributions.