vizicities Merging, sharing and modifying data with the Blueprint API

Merging, sharing and modifying data with the Blueprint API

Open brianchirls opened this issue 9 years ago • 22 comments

In my experiments with Vizicities and the Blueprint API, I've come upon the need for modifying the way data flows between inputs and outputs. I might, for example, want to write an output module that uses the same data as the building tiles output but does something else with it. Or I might want to have a middleware-esque filter to modify the data between the input and the output. It could also be useful for merging two or more data sources into a single output. Like if I wanted to render a chloropleth based on census data but modify it according to weather data from another source.

I can think of a few solutions to suggest:

Build input data caching into either the individual input modules or into the switchboard itself. This is probably a good idea anyway, but it has the Hard Thing (per Phil Karlton) that is cache invalidation. And it more addresses the case where you're moving away from a tile and then back to it a moment later. I think there are better ways to optimize for the case where two or more outputs need the same tile data at the same time.
Build input and output modules that serve as proxies for multiple other inputs and outputs. This has the advantage of not having to modify the core code right now, but as it's only version 0.2.0, it's a very small advantage. It's not terribly efficient in terms of coding time (nor probably CPU time), and it could be error prone, especially when the core internals change over time.
Make Blueprint into a full data flow architecture, as a network of nodes with inputs, outputs and filters in between. You could chain filters together and merge just about any two or more kinds of data. This is a bit more ambitious, but it would thoroughly address the above problems and more. Everything could be very modular and pretty efficient, I think. For example, you could easily write a custom node to omit buildings based on an arbitrary rule - let's say, anything under a certain height.
Allow inputs and outputs to be specified as arrays. This would address the sharing problem without too much work to overhaul everything. But it would not allow merging or filtering of data.

I just want to get a discussion going. Maybe it makes sense to start with 4 and then see how it goes before moving on to 3.

Dec 10 '14 15:12 brianchirls

The ability to merge data sources could facilitate 3D terrain (#46).

One challenge I foresee is rendering a building that exists on a slope or with any kind of elevation differential. You wouldn't want the roof rendered at an angle. I would guess that you loop through all the points of the building outline, calculate the lowest elevation and then start the building from there.

You'd need to incorporate the elevation data into the building data before the buildings are turned into geometry buffers...

Dec 10 '14 17:12 brianchirls

Improving the Blueprint API is incredibly important and really high on the list of things to do. Its current state is very very early and hitting issues like multiple inputs/outputs, merging and processing has shown that there's a lot of room for improvement.

Some kind of cache is definitely needed, at least for keeping a reference to nearby tiles incase you go back to them in the future.
This is my least favourite of the options and I'm completely OK with major changes to get things right. I'd much rather do a larger change now than further down the line, especially while the Blueprint API is so young.
This is very appealing and I can totally see the benefit. It's actually quite similar to the original intention of the Blueprint API, in that you have a set of modules that you can kind of connect together as you wish. This is my favourite option, though it depends on my point at the end of this comment.
This is the quick-fix I was going to implement sometime soon to at least address the use of triggering multiple inputs/outputs in one go.

It's important to be clear about the core purpose of the Blueprint API. The one thing that it always has to keep is the ability to be configured by JSON, simply because the idea is to eventually build these configurations via a GUI / tooling infrastructure and I really don't want to start getting dirty with eval() and generating complex code composition on the fly. This is why the current implementation can seem a little clunky, as the purpose is to be able to manipulate it with text rather than pure JS.

I'd be keen to talk more about the third idea and how that might look in an implementation that's configurable via JSON/text.

On a related note, one thing that needs improving with the Blueprint API is the ability to write more complex transformations that can pull values from various parts of the trigger data. For example, right now it's impossible to do multiple loops through different parts of the trigger data, so things like multi-part features (eg. KML) can't be implemented yet. It's possible this could be treated in another issue, though it sounds like it might fall into the vision for your third idea.

Dec 10 '14 19:12 robhawkes

I agree with everything you said, and I don't think any of it should be a problem. But defining complex data transforms declaratively can be tricky, and I haven't done much of it.

I think you'll want to have a good API for setting up (and possibly modifying?) the network, as well as a way to export and import the whole thing as JSON.

I've done something similar over at Seriously.js, but the JSON import/export code is a mess and off in an old branch somewhere. And you'll want Blueprint nodes to pass data asynchronously. See http://strawjs.com/ for inspiration.

Dec 10 '14 20:12 brianchirls

It's not going to be easy for sure, but hopefully we can end up with a suitable API that's powerful enough for those who want to construct things by hand, and flexible enough to be visualised and outputted via some kind of GUI / tooling.

I don't think it'll be perfect first time around but it'll sure be a lot better than the current version now we better understand the limitations. I don't have an API or structure in my head just yet so I'll mull things over and see what comes up.

Let me know if you have any thoughts on what it could look like, or just on things we should think about or take inspiration from (I'll check out StrawJS properly soon).

It may also be worth considering the opposite end of the problem too. Perhaps there's a way to generate pure JS configurations dynamically via a GUI / tooling while keeping things safe (eval, etc) and making the config easy to store somewhere too (eg. in a file, or a database). Solving this, or at least getting me feeling more comfortable with it might make things a whole bunch easier in the long run.

Dec 10 '14 22:12 robhawkes

There are a bunch of state things and useful events that I have on each node in Seriously.js. I can talk you through it at some point, but you probably don't need to worry about them right away. They're things I added over a few years.

For an API, you could probably give each node a method or two to get or set any incoming and outgoing connections to other nodes. That way, the whole data structure will nicely represent the network. You could, in theory, export a dummy object that mimics the same graph structure, but you can't accurately stringify it because you might have one node referenced multiple times.

In that case, I can think of two options (I prefer the second, I think):

Break your JSON into two sections: a list of nodes and a list of edges
Save a list of nodes, and make sure each node has an id string (either given or generated). Each node will have a list of it's incoming nodes, referenced by that id.

Also, I'm pretty sure you'll want to make sure the graph is directed, i.e. no loops.

Dec 11 '14 04:12 brianchirls

Lots for me to look into! Hopefully I can set some time aside soon to give this some proper thought and consider how the various options may look. I'm not hugely familiar with node graphs so that may hinder things slightly. In any case, I'm sure we'll get there.

I've been thinking about representing the Blueprint API in pure JS rather than text-based JSON and the issues that may bring up (having to eval, etc). I was thinking that using Web Workers to unpack and process these pure JS configurations may reduce potential for bad things to happen. Would need to consider the ramifications of offloading that all into a worker though, like communication back and forth between Blueprint and the rest of the system becoming convoluted.

Dec 11 '14 18:12 robhawkes

I'm not advocating for use of eval. eval stinks. I'm just saying that you might also have an API for manipulating the node graph. But the "state" of the graph (i.e. all the nodes, the edges and any other configuration data) should be representable in JSON.

Being able to load up a whole switchboard from a JSON file is pretty useful.

On Thu, Dec 11, 2014 at 1:47 PM, Robin Hawkes [email protected] wrote:

Lots for me to look into! Hopefully I can set some time aside soon to give this some proper thought and consider how the various options may look. I'm not hugely familiar with node graphs so that may hinder things slightly. In any case, I'm sure we'll get there.

I've been thinking about representing the Blueprint API in pure JS rather than text-based JSON and the issues that may bring up (having to eval, etc). I was thinking that using Web Workers to unpack and process these pure JS configurations may reduce potential for bad things to happen. Would need to consider the ramifications of offloading that all into a worker though, like communication back and forth between Blueprint and the rest of the system becoming convoluted.

— Reply to this email directly or view it on GitHub https://github.com/vizicities/vizicities/issues/104#issuecomment-66667428 .

Dec 12 '14 11:12 brianchirls

Ah ok, I like the JSON approach too but so far it's proven a right pain when trying to describe complex mapping and filter functions. Perhaps splitting things into a node graph will help by having each node only do one thing and thus not requiring such complex config. I still think it might be a problem though (eg. running loops over different levels of the input data and condensing into a single output is impossible to even define with the current approach).

I'm excited to put some though into this and to see what a better approach looks like.

Dec 12 '14 12:12 robhawkes

I think there are two separate issues here: First, is representing the node graph with JSON, which I think is not too hard. Maybe not trivial, but totally within grasp. The other is representing complex transforms with JSON, which is a bigger pain in the butt.

I ran into this problem yesterday when trying to use your existing built-in transform capabilities to make a KML from the Android "My Tracks" app work with BlueprintOutputDebugLines. I just couldn't get it done. (Converting to GPX instead worked.)

I think what you'd need to really cover all cases is some JSON equivalent of XSLT. I've run across one or two libraries that attempt something like that, and I haven't found anything really satisfying. I suspect it might be more trouble than it's worth for the purposes of vizicities, and perhaps a full node graph will make it sufficiently easy to write custom filters for any really complex transforms.

On Fri, Dec 12, 2014 at 7:25 AM, Robin Hawkes [email protected] wrote:

Ah ok, I like the JSON approach too but so far it's proven a right pain when trying to describe complex mapping and filter functions. Perhaps splitting things into a node graph will help by having each node only do one thing and thus not requiring such complex config. I still think it might be a problem though (eg. running loops over different levels of the input data and condensing into a single output is impossible to even define with the current approach).

I'm excited to put some though into this and to see what a better approach looks like.

— Reply to this email directly or view it on GitHub https://github.com/vizicities/vizicities/issues/104#issuecomment-66766402 .

Dec 12 '14 12:12 brianchirls

You're right, it's definitely 2 separate problems. Basically, if a new approach (like a node graph) solves the complex transform problem as well (which it totally could with clever use or combination of nodes) then it might be more of a round-trip but it's win win from my perspective. It's been gnawing at my brain since creating the Blueprint API in the first place.

Dec 12 '14 12:12 robhawkes

I want to throw a couple ideas in here for filter nodes so I can get them out of my head before I forget them.

Merging building data with terrain data
Merging a path (GPX, KML, etc) with road data for map matching
Merging per-region data with a shapefile. e.g. one file/feed of income data for every postal code in a given country and a separate data source for outlines of postal codes.
Arbitrary transforms of building models or selective removal of buildings (e.g. scale 'em up by a given factor; remove any outside/within a given height or area range)

Dec 12 '14 12:12 brianchirls

What do you mean by round-trip?

Dec 12 '14 12:12 brianchirls

By round-trip I mean that you may need to join up multiple nodes (and do some extra looping) to achieve something that could have been written in a single crazy transform. I suppose a custom node could be written to do the crazy transform but then you have the same problem re: JSON representation.

Good idea with the use-cases for nodes. I'll have a think about some of the areas I've hit and others have had trouble with that nodes could help solve – I'll jot them down in here.

Dec 12 '14 12:12 robhawkes

Ah, you mean like going back to the data a couple times. Yeah, true. I have two concerns about this.

First is performance, of course. If you're dealing with byte/float arrays, the JS engine should optimize pretty well and/or you could run it in a worker and transfer the data without copying. But I suspect most of it will be large arrays of objects with a bunch of string data. A better practice might be to break large tasks into smaller chunks. That is, loop through for a while, until you've run for a couple of milliseconds, and then call setTimeout and pick up where you left off. That way you don't block the UI or requestAnimationFrame. Maybe try this with a few nodes and then if it's useful, build it in as a utility.

The second concern is how to specify what kind of data each node expects as input and omits as output. I've been thinking about something like this for another project. Honestly, for data as complex as this, I don't know that there is a good solution. Maybe some best practices for data structures will emerge over time, but for now I think we'll probably just have to wing it.

On Fri, Dec 12, 2014 at 7:49 AM, Robin Hawkes [email protected] wrote:

By round-trip I mean that you may need to join up multiple nodes (and do some extra looping) to achieve something that could have been written in a single crazy transform. I suppose a custom node could be written to do the crazy transform but then you have the same problem re: JSON representation.

Good idea with the use-cases for nodes. I'll have a think about some of the areas I've hit and others have had trouble with that nodes could help solve – I'll jot them down in here.

— Reply to this email directly or view it on GitHub https://github.com/vizicities/vizicities/issues/104#issuecomment-66768503 .

Dec 12 '14 13:12 brianchirls

Winging it works for me. We can always tweak and improve as things progress and we suddenly discover the perfect way to do it.

As for a use for a better system, one issue people have hit is how to combine data from a number of places within the same transform. For example, having a KML that includes multiple Placemark elements that you want to iterate over in the transform, that each contain a MultiGeometry element that itself contains multiple Polygon or Point elements that you also want to iterate over. Effectively, transforms over multi-dimensional arrays and complex objects.

In the above case, one potential approach using the current Blueprint API was to implement a set of tags that you could define in the transform string to indicate where to run another loop. For example:

itemsProperties: "document.folder.placemark",
transformation: {
  coordinates: "multigeometry.polygon[{n}].etc"
}

Where {n} indicates where to loop again – it's not perfect but it would probably work.

I'm still unsure whether this is related to or would be solved by the bigger issue here.

Dec 12 '14 13:12 robhawkes

Yeah, I'm not sure. But it's starting to look a little like XSLT, if memory serves. I haven't looked at it in a few years because...XML...blech.

On Fri, Dec 12, 2014 at 8:16 AM, Robin Hawkes [email protected] wrote:

Winging it works for me. We can always tweak and improve as things progress and we suddenly discover the perfect way to do it.

As for a use for a better system, one issue people have hit is how to combine data from a number of places within the same transform. For example, having a KML that includes multiple Placemark elements that you want to iterate over in the transform, that each contain a MultiGeometry element that itself contains multiple Polygon or Point elements that you also want to iterate over. Effectively, transforms over multi-dimensional arrays and complex objects.

In the above case, one potential approach using the current Blueprint API was to implement a set of tags that you could define in the transform string to indicate where to run another loop. For example:

itemsProperties: "document.folder.placemark", transformation: { coordinates: "multigeometry.polygon[{n}].etc" }

Where {n} indicates where to loop again – it's not perfect but it would probably work.

I'm still unsure whether this is related to or would be solved by the bigger issue here.

— Reply to this email directly or view it on GitHub https://github.com/vizicities/vizicities/issues/104#issuecomment-66771036 .

Dec 12 '14 13:12 brianchirls

Uh oh, that doesn't sound good! Still, it's worth me having a look at to see if it's appropriate or what I can learn from it.

Dec 12 '14 13:12 robhawkes

Don't worry, there are some good parts. Like XPath - not unlike CSS selectors.

On Fri, Dec 12, 2014 at 8:28 AM, Robin Hawkes [email protected] wrote:

Uh oh, that doesn't sound good! Still, it's worth me having a look at to see if it's appropriate or what I can learn from it.

— Reply to this email directly or view it on GitHub https://github.com/vizicities/vizicities/issues/104#issuecomment-66772187 .

Dec 12 '14 13:12 brianchirls

Dumping a few links that I've come across in my research into a JS/JSON equivalent for XSLT:

JSONT – Particularly the two-dimensional array example at the bottom
Tempo – I like the Tempo.prepare(template).render(data); approach
Jolt – Written in Java for some reason but looks interesting
jq – On the commend line but the API is quite simple and seemingly powerful
JSONiq – Looks a bit mental but allows for complex transforms
json2json – Looks simple enough, though not sure it can handle the complex use-cases

And finally, XSLT itself shouldn't be ruled out. After looking into it more it certainly looks powerful and the benefit is that it's already known and well documented. Looks like it's not unheard of to use XSLT to transform JSON.

Dec 12 '14 14:12 robhawkes

Maybe it makes sense to break object transformation into a separate ticket.

One option is to build a filter node module that incorporates one of those transformation libraries. That way, you can load up whichever one you want as a separate script, and you don't have to include all that into the distribution.

On Fri, Dec 12, 2014 at 9:01 AM, Robin Hawkes [email protected] wrote:

Dumping a few links that I've come across in my research into a JS/JSON equivalent for XSLT:

JSONT http://goessner.net/articles/jsont/ – Particularly the two-dimensional array example at the bottom

Tempo http://twigkit.github.io/tempo/ – I like the Tempo.prepare(template).render(data); approach

Jolt https://github.com/bazaarvoice/jolt – Written in Java for some reason but looks interesting

jq http://stedolan.github.io/jq/ – On the commend line but the API is quite simple and seemingly powerful http://stedolan.github.io/jq/tutorial/

JSONiq http://www.jsoniq.org/ – Looks a bit mental but allows for complex transforms http://www.jsoniq.org/docs/JSONiq-usecases/html-single/index.html#json2json

json2json https://github.com/joelvh/json2json – Looks simple enough, though not sure it can handle the complex use-cases

And finally, XSLT itself shouldn't be ruled out. After looking into it more it certainly looks powerful and the benefit is that it's already known and well documented. Looks like it's not hard to use XSLT to transform JSON.

— Reply to this email directly or view it on GitHub https://github.com/vizicities/vizicities/issues/104#issuecomment-66775532 .

Dec 12 '14 14:12 brianchirls

window.XSLTProcessor

https://developer.mozilla.org/en-US/docs/Transforming_XML_with_XSLT

Dec 12 '14 14:12 brianchirls

I've moved the transformation discussion over to #120 so we can keep this related to the changes needed to create a better flow and control of data throughout the Blueprint API (eg. node graphs).

I couldn't help but look into this on the train last night and I came across some promising options to explore deeper:

NoFlo – This looks amazing (and I love the visual GUI) but I've not looked at the docs yet. Also has a(nother) GUI that might be worth looking into.
dataflo.ws – Not looked into this deeply
Straw – As highlighted by @brianchirls. I like the tutorials though
Dataflow – Updated recently but unsure on possibilities

Dec 13 '14 10:12 robhawkes

vizicities vizicities copied to clipboard

Merging, sharing and modifying data with the Blueprint API

vizicities
vizicities copied to clipboard