tangram-es
tangram-es copied to clipboard
[wip] Early filter/drop unused feature properties
We should drop unused feature properties in tilesource parser (in particular 'id') to optimize style matching and styled geometry building by grouping features with identical properties.
This is just a prototype. We should auto-generate the list of used properties by layer and produce the required yaml for inclusion in scene.yaml. Since we cannot know which properties are used by js functions the generated filter could need manual tweaking (unless we e.g. require that filters check for 'key exists' for properties used in js functions)
If I understand what you are getting at -- I could see providing hints in the scene YAML, in the data source definition, that specifies which properties (by layer) are required for the style. We'd want this to be strictly optional though (as an optimization for "production-ready" scenes with limited use cases), to preserve existing flexibility for other use cases.
If you are auto-generating, you can still extract some referenced feature properties from JS functions by examining the function source (e.g. feature.propertyName), but of course this is not exhaustive (doesn't work for feature[dynamicPropName] or var alias = feature; return alias.propName. I think this generic capability needs to be retained as the default, but support opt-in optimizations around it.
Actually, can you say a bit more about the motivation / expected savings for ES here? Is it to improve initial feature parse time from the data source, or reduce on-going memory usage after features are loaded, or speeds binding to JS scene functions, etc.?
On the JS side, my initial expectation would be that while this would reduce memory usage, it's unlikely to have a noticeable impact on scene parse or build time.
The primary motivation was to reduce the number js function calls for filter and styling functions by grouping features with identical (used) properties. But I assume it will also be noticeable in overall tile-building performance by not having to allocate and memcpy e.g. 'source:openstreetmap' 10000 times (for some tiles).
Yeah this will certainly be opt-in - Though it would be nice to have the yaml for source:x:filters auto-generated that can then be imported into the scene used in production
So the idea is to group features with identical (used) properties and then cache and reuse the JS function result for them? That could be a compelling advantage, depending on how many features actually do share identical properties. This caching would have to also account for the JS function "keywords" - I can imagine it getting pretty complicated.
As for reducing allocations and copies of tile data, there are other (potentially simpler) ways to solve that, like Hannes' TileDataSink idea.
@matteblair the grouping could be done on the TileSource side during TileData creation: Just check if a Feature with the same properties was added before then add the geometry to it. So a linestring becomes a multilinestring, etc.
(I haven't abandoned to TileDataSink idea - but my plans changed to have the tile-building pipeline require zero copies of any feature properties :)
@matteblair I've implemented the feature merging for mvt and did some testing with our internal style adding:
sources:
mapzen:
drop_feature_properties: ['name:*', 'source', 'id', 'osm_relation', 'loc_name*', 'alt_name*', 'old_name*', 'int_name*', 'root_id', 'id:right', 'id:left', 'access']
In our test-tile with 6023 features 1460 less need to processed:
//------------------------
// jscore
TileBuilderFixture/TileBuilderBench 286829216 ns 283980792 ns 2
TileBuilderFixture/TileBuilderBench 291913764 ns 288974147 ns 2
TileBuilderFixture/TileBuilderBench 301659424 ns 298690330 ns 2
TileBuilderFixture/TileBuilderBench 309856807 ns 307212129 ns 2
TileBuilderFixture/TileBuilderBench_mean 297564802 ns 294714349 ns 2
TileBuilderFixture/TileBuilderBench_median 296786594 ns 293832238 ns 2
TileBuilderFixture/TileBuilderBench_stddev 10247705 ns 10330571 ns 2
// Drop props
TileBuilderFixture/TileBuilderBench 223607971 ns 221392799 ns 3
TileBuilderFixture/TileBuilderBench 225533173 ns 223777338 ns 3
TileBuilderFixture/TileBuilderBench 230192937 ns 228640216 ns 3
TileBuilderFixture/TileBuilderBench 224984037 ns 223441356 ns 3
TileBuilderFixture/TileBuilderBench_mean 226079529 ns 224312927 ns 3
TileBuilderFixture/TileBuilderBench_median 225258605 ns 223609347 ns 3
TileBuilderFixture/TileBuilderBench_stddev 2859332 ns 3071323 ns 3
//------------------------
// duktape
TileBuilderFixture/TileBuilderBench 488272359 ns 482404560 ns 2
TileBuilderFixture/TileBuilderBench 490298251 ns 486407092 ns 2
TileBuilderFixture/TileBuilderBench 486169248 ns 484802320 ns 2
TileBuilderFixture/TileBuilderBench 490733244 ns 489256989 ns 2
TileBuilderFixture/TileBuilderBench_mean 488868276 ns 485717740 ns 2
TileBuilderFixture/TileBuilderBench_median 489285305 ns 485604706 ns 2
TileBuilderFixture/TileBuilderBench_stddev 2094660 ns 2876146 ns 2
// Drop props
TileBuilderFixture/TileBuilderBench 378153430 ns 376445184 ns 2
TileBuilderFixture/TileBuilderBench 382496444 ns 380752638 ns 2
TileBuilderFixture/TileBuilderBench 385472204 ns 383680265 ns 2
TileBuilderFixture/TileBuilderBench 382495757 ns 380782343 ns 2
TileBuilderFixture/TileBuilderBench_mean 382154459 ns 380415108 ns 2
TileBuilderFixture/TileBuilderBench_median 382496100 ns 380767491 ns 2
TileBuilderFixture/TileBuilderBench_stddev 3013807 ns 2981629 ns 2
So for jscore we gain 60ms and 100ms with duktape. (NB these measurements are not from duktape-optimization branch, and there are some regex filters which do more than 90% nonsense work which jscore seems to be able to optimize a bit)
Nice! FYI we do need to keep name:* variants for localization purposes. A more realistic test would be all names but name:en and name (and then swap around the en part based on global?)
@nvkelso this is still prototyping, but one can already override the drop_feature_properties: ['name:*'] by adding keep_feature_properties: global.ux_language_feature_name_keys e.g. which either has to be injected by the app or created from ux_language/fallback in some other way.
This is a very interesting idea and I am curious to test something similar on Tangram JS. Given a lot of the operations being saved here are native to the platform on JS, it's unclear if this technique would result in a net positive or negative in that context (feature filtering/matching does consume a significant amount of time for JS feature parsing as on ES, so there is a savings potential), but I am very curious to see what happens!