tangram-es icon indicating copy to clipboard operation
tangram-es copied to clipboard

[wip] Early filter/drop unused feature properties

Open hjanetzek opened this issue 6 years ago • 11 comments
trafficstars

We should drop unused feature properties in tilesource parser (in particular 'id') to optimize style matching and styled geometry building by grouping features with identical properties.

This is just a prototype. We should auto-generate the list of used properties by layer and produce the required yaml for inclusion in scene.yaml. Since we cannot know which properties are used by js functions the generated filter could need manual tweaking (unless we e.g. require that filters check for 'key exists' for properties used in js functions)

hjanetzek avatar Dec 10 '18 16:12 hjanetzek

If I understand what you are getting at -- I could see providing hints in the scene YAML, in the data source definition, that specifies which properties (by layer) are required for the style. We'd want this to be strictly optional though (as an optimization for "production-ready" scenes with limited use cases), to preserve existing flexibility for other use cases.

If you are auto-generating, you can still extract some referenced feature properties from JS functions by examining the function source (e.g. feature.propertyName), but of course this is not exhaustive (doesn't work for feature[dynamicPropName] or var alias = feature; return alias.propName. I think this generic capability needs to be retained as the default, but support opt-in optimizations around it.

bcamper avatar Dec 10 '18 16:12 bcamper

Actually, can you say a bit more about the motivation / expected savings for ES here? Is it to improve initial feature parse time from the data source, or reduce on-going memory usage after features are loaded, or speeds binding to JS scene functions, etc.?

On the JS side, my initial expectation would be that while this would reduce memory usage, it's unlikely to have a noticeable impact on scene parse or build time.

bcamper avatar Dec 10 '18 16:12 bcamper

The primary motivation was to reduce the number js function calls for filter and styling functions by grouping features with identical (used) properties. But I assume it will also be noticeable in overall tile-building performance by not having to allocate and memcpy e.g. 'source:openstreetmap' 10000 times (for some tiles).

hjanetzek avatar Dec 10 '18 17:12 hjanetzek

Yeah this will certainly be opt-in - Though it would be nice to have the yaml for source:x:filters auto-generated that can then be imported into the scene used in production

hjanetzek avatar Dec 10 '18 17:12 hjanetzek

So the idea is to group features with identical (used) properties and then cache and reuse the JS function result for them? That could be a compelling advantage, depending on how many features actually do share identical properties. This caching would have to also account for the JS function "keywords" - I can imagine it getting pretty complicated.

As for reducing allocations and copies of tile data, there are other (potentially simpler) ways to solve that, like Hannes' TileDataSink idea.

matteblair avatar Dec 10 '18 17:12 matteblair

@matteblair the grouping could be done on the TileSource side during TileData creation: Just check if a Feature with the same properties was added before then add the geometry to it. So a linestring becomes a multilinestring, etc.

hjanetzek avatar Dec 10 '18 17:12 hjanetzek

(I haven't abandoned to TileDataSink idea - but my plans changed to have the tile-building pipeline require zero copies of any feature properties :)

hjanetzek avatar Dec 10 '18 17:12 hjanetzek

@matteblair I've implemented the feature merging for mvt and did some testing with our internal style adding:

sources:
    mapzen:
        drop_feature_properties: ['name:*', 'source', 'id', 'osm_relation', 'loc_name*', 'alt_name*', 'old_name*', 'int_name*', 'root_id', 'id:right', 'id:left', 'access']

In our test-tile with 6023 features 1460 less need to processed:

//------------------------
// jscore
TileBuilderFixture/TileBuilderBench         286829216 ns  283980792 ns          2
TileBuilderFixture/TileBuilderBench         291913764 ns  288974147 ns          2
TileBuilderFixture/TileBuilderBench         301659424 ns  298690330 ns          2
TileBuilderFixture/TileBuilderBench         309856807 ns  307212129 ns          2
TileBuilderFixture/TileBuilderBench_mean    297564802 ns  294714349 ns          2
TileBuilderFixture/TileBuilderBench_median  296786594 ns  293832238 ns          2
TileBuilderFixture/TileBuilderBench_stddev   10247705 ns   10330571 ns          2

// Drop props
TileBuilderFixture/TileBuilderBench         223607971 ns  221392799 ns          3
TileBuilderFixture/TileBuilderBench         225533173 ns  223777338 ns          3
TileBuilderFixture/TileBuilderBench         230192937 ns  228640216 ns          3
TileBuilderFixture/TileBuilderBench         224984037 ns  223441356 ns          3
TileBuilderFixture/TileBuilderBench_mean    226079529 ns  224312927 ns          3
TileBuilderFixture/TileBuilderBench_median  225258605 ns  223609347 ns          3
TileBuilderFixture/TileBuilderBench_stddev    2859332 ns    3071323 ns          3

//------------------------
// duktape
TileBuilderFixture/TileBuilderBench         488272359 ns  482404560 ns          2
TileBuilderFixture/TileBuilderBench         490298251 ns  486407092 ns          2
TileBuilderFixture/TileBuilderBench         486169248 ns  484802320 ns          2
TileBuilderFixture/TileBuilderBench         490733244 ns  489256989 ns          2
TileBuilderFixture/TileBuilderBench_mean    488868276 ns  485717740 ns          2
TileBuilderFixture/TileBuilderBench_median  489285305 ns  485604706 ns          2
TileBuilderFixture/TileBuilderBench_stddev    2094660 ns    2876146 ns          2

// Drop props
TileBuilderFixture/TileBuilderBench         378153430 ns  376445184 ns          2
TileBuilderFixture/TileBuilderBench         382496444 ns  380752638 ns          2
TileBuilderFixture/TileBuilderBench         385472204 ns  383680265 ns          2
TileBuilderFixture/TileBuilderBench         382495757 ns  380782343 ns          2
TileBuilderFixture/TileBuilderBench_mean    382154459 ns  380415108 ns          2
TileBuilderFixture/TileBuilderBench_median  382496100 ns  380767491 ns          2
TileBuilderFixture/TileBuilderBench_stddev    3013807 ns    2981629 ns          2

So for jscore we gain 60ms and 100ms with duktape. (NB these measurements are not from duktape-optimization branch, and there are some regex filters which do more than 90% nonsense work which jscore seems to be able to optimize a bit)

hjanetzek avatar Dec 17 '18 16:12 hjanetzek

Nice! FYI we do need to keep name:* variants for localization purposes. A more realistic test would be all names but name:en and name (and then swap around the en part based on global?)

nvkelso avatar Dec 17 '18 18:12 nvkelso

@nvkelso this is still prototyping, but one can already override the drop_feature_properties: ['name:*'] by adding keep_feature_properties: global.ux_language_feature_name_keys e.g. which either has to be injected by the app or created from ux_language/fallback in some other way.

hjanetzek avatar Dec 17 '18 21:12 hjanetzek

This is a very interesting idea and I am curious to test something similar on Tangram JS. Given a lot of the operations being saved here are native to the platform on JS, it's unclear if this technique would result in a net positive or negative in that context (feature filtering/matching does consume a significant amount of time for JS feature parsing as on ES, so there is a savings potential), but I am very curious to see what happens!

bcamper avatar Dec 18 '18 15:12 bcamper