stroom Make off heap ref data queryable via dashboard/API

It would be useful to be able to search the various K=>V maps stored in the off heap ref store via the dashboard and the API.

I.e. so a user can get a value for a given map and key or get all KVs for a map.

Thought will need to be given to

how to handle values that are FastInfoSet serialised XML fragments.
how we represent a datasource for the ref store. Do we need a datasource entity for each map in the ref store (such that the query just filters by key), or one for the whole ref store (and the query must filter on map + key).

Jun 10 '20 14:06 at055612

Also, the datasource would need some means of allowing the user to set the date used for the lookup.

I suspect one datasource to cover all maps would be preferable as maps are created dynamically during ref loading. It could then have fields of:

map name
key
value
effective time
lookup time (query condition only)

Not sure the current code can do lookups by value, but in theory it could just inefficiently scan the whole map to do it.

Jun 10 '20 15:06 at055612

The snag with this is that ref data is local to a node and likely only on processing nodes. A node receiving an api call for ref data may have to federate the query out to other nodes. Either it could ask all nodes to run the query or we could maintain and cache some kind of manifest of which nodes hold which maps and which time ranges. Thus the controlling node could pass the query to one of the nodes that is believed to have the data.

Jun 26 '20 06:06 at055612

This functionality partially exists in v7.0-beta.143. The use can query the ref data store but only on the local node. In systems where the UI nodes don't do any processing then this means the query will return nothing or you need to manually load the required reference data into the store by using the api to trigger the load.

What we need is the ability to define a reference datasource entity in the explorer tree. The entity can then define one or more loaders it will use, as is done in the properties of an XSLT. The dash user can then then select one of these ref datasources to query and it would operate the same way as an XSLT lookup, i.e. triggering a load if the data isn't available. The XSLT properties could then be changed to allow selection of a single ref datasource rather than adding multiple ref feeds.

To make this possible the query needs some additional information:

effective time (default to now if not set)
node - i.e. which node to attempt the lookup/load on
map name (optional) This could be captured via query attributes, see #2420

The datasource screen could also contain a tabbed page for doing a single lookup. i.e. prompt the user for the map, key and effective time and it would then optionally load the data and then query the store to display the value. An additional button to trigger a load (either by stream ID or by date) would also be useful.

Sep 10 '21 13:09 at055612

There are two use cases here, queries against ref data using the dashboard and a user doing a lookup of a single key against a single map. The former is useful when figuring out what pipeline lookups are doing. The latter is useful for users that just want to get a value for a key in a map, e.g. looking up a user's details.

A Reference Data datasource in the explorer tree would need two tabs, Settings and Lookup. Visibility of the settings tab would be controlled by an app permission so lowly users would not need to see it. Settings would contain the list of ref loaders as described above. Lookup would have the following input fields:

Map name
Key
Lookup time [if not set use now()] It would also have:
A text pane to show the returned value (which could be XML or a string)
A label to show the effective time of the stream of the returned value
Maybe the stream ID for the returned value.

Potentially the settings tab could also set an optional extraction pipeline that will transform XML values into something more friendly, i.e. html to go into the text pane.

When the user executes a lookup it would use the defined ref loaders to determine what effective streams it needs for the lookup. It would then ask all nodes whether they have all of those streams loaded in their ref store. To avoid hitting all nodes it could hit them serially and async with say a 50ms delay between each so that it is likely to find a suitable node before hitting too many others. It drops out as soon as it finds one. Once a suitable node is found it offloads the lookup query to it to get the result, but does not trigger a load (e.g. if the data got purged at about the same time).

If no nodes have the streams loaded then it can just give the user a warning to say the data is not loaded. If there are no effective streams then the user can get a warning about that.

Feb 02 '23 13:02 at055612