update and extend event history section based on Crossbar.io implementation
see example at https://github.com/crossbario/crossbarexamples/tree/master/event-history
For all contributors: This is working code, and we think this is a workable solution, but feedback is welcome, as always.
Yeah, I agree: feedback from other implementors (of routers) would be nice to have ..
I havn't implemented event history yet, but I will in the next weeks. One question, why did you call wamp.subscription.get_events instead of wamp.topic.history.list? What should I use?
"store": {
"type": "memory",
"limit": 100,
"event-history": [
{
"uri": "com.example.oncounter",
"match": "exact",
"limit": 1000
}
]
}
does this mean you store 1000 events for this topic, but to a limit of 100MB of memory?
Seems wrong to get event history by having events sent to a subscription. A subscription is for getting events as they are published, and events would be delivered individually (which could be painful with large history). Also, subscription will only get EVENT and will not see other important info associated with history, like timestamp (when event happened). What seems correct is to get event history as the result of a special RPC call - essentially another meta procedure (wamp.topic.history.x). This way the history can be delivered in a single result, or progressive results if necessary for large history. Additional historical info can be retrieved, like event timestamp.
Here are my thoughts on FIXME items in http://wamp-proto.org/static/rfc/draft-oberstet-hybi-crossbar-wamp.html#rfc.section.14.4.8.1
- Topic URI should be used, not subscription id: Event history should not require a subscription to get history. Asking for history to be sent to a subscription requires additional checks to ensure that the subscription belongs to the same session (do not want to replay events to an unsuspecting recipient). Event history should be returned as a result of a call to
wamp.topic.history.x
To get pattern based history, the call should have an Options|dict when the matching policy can be specified, just as with a subscribe message, e.g. {"match": "prefix"}. This determines how the topic|uri parameter is compared with saved event topics.
-
wamp.topic.history.aftercan be implemented efficiently either by storing events in order received, or by sorting by timestamp and returning everything with a later timestamp that the "after" publication. -
Pattern-based matching specified by additional
Options|dictcontaining match policy. -
This should work same as other meta procedures
-
Events published using black/white listing with session ID are not saved.
let me answer in parts, because after rereading the original context, and @gammazero response, I guess I agree with some parts;)
I repost the original context:
Should we use topic|uri or subscription|id in Arguments?
Since we need to be able to get history for pattern-based subscriptions as well, a subscription|id makes more sense: create pattern-based subscription, then get the event history for this.
The only restriction then is that we may not get event history without a current subscription covering the events. This is a minor inconvenience at worst.
Can wamp.topic.history.after be implemented (efficiently) at all?
How does that interact with pattern-based subscriptions?
The same question as with the subscriber lists applies where: to stay within our separation of roles, we need a broker + a separate peer which implements the callee role. Here we do not have a mechanism to get the history from the broker.
How are black/whitelisted sessionIDs treated? A client which requests event history will have a different sessionID than on previous connections, and may receive events for which it was excluded in the previous session, or not receive events for which it was whitelisted. - see <https://github.com/wamp-proto/wamp-proto/issues/206>
Topic URI should be used, not subscription id
This is a part where I not agree, as in: No, both the live event dispatching and the event history use subscription IDs as per WAMP protocol spec. There are however WAMP meta API procs to map a topic to a subscription ID in the meta API, essentially allowing user code to reuse the mapping logic inside a router. But I'm not sure I get what you aim for here.
And actually, I agree with all the rest 2. - 5. in above. These puzzle pieces looks solved, and the sketches @gammazero outlines seem to work for me ... needs a paragraph or section in the spec though of course;)
there is wamp.subscription.match (eg here) that allows to match an URI pattern to subscription ID, and then there is wamp.subscription.get_events (eg here) to retrieve events from the history of the subscription.
there are a couple of known issues, missing pieces in the spec, and missing bits in implementations.
at least these:
- flexible, efficient filtering and selection of the event history retrieved
- semantics for authorization rules under persistence and the event history feature in general
- per-event persistence options
@oberstet OK, I misunderstood what was being asked for in the first case where subscription ID was being used to retrieve events. I thought the retrieved historical events were sent to the specified subscription. Now I see that the call to wamp.subscription.get_events is returning the historical events that were (or would have been) delivered to that subscription. That looks good to me. My whole concern was that event history was returned as the result of a meta RPC call, which it is, so good.
As for the other points above:
- The flexible, efficient filtering... This seems like it needs to be left up to the implementation. Maybe one uses RDBMS, another uses different organization and search according to what it wants to optimize. What the spec should strive for is a clear description of the expected functionality, so that implementors can figure out how best to optimize given their tools and preferences. I don't think spec can say much about efficient implementation.
- Semantics for authz... Only after passing authz is the message persistence policy evaluated for the message. Let's not mix message authz with authz for persistence. Evaluate separately.
- Per-even persistence options. I an think of a number of options, (expiration time, max storage, max number of messages). However, the semantics of such options are more important that the actual options. What I mean is to decide how these options are applied:
- Persist certain types of events?
- Persist events from certain sources?
- Persist events with some other property?
Black/white listing, except for when session ID is used, should apply to event history. This way client requesting event history will only receive events that are allowed to their authrole/authid.
My whole concern was that event history was returned as the result of a meta RPC call, which it is, so good.
Yep, this is how it is designed.
flexible, efficient filtering
I would say, the "efficient" part is definitely an implementation detail. However, what we could do is specify filtering parameters for the meta API procedures ..
Semantics for authz
There are multiple aspect that we need to discuss and add to the spec:
- what if the publisher isn't authorized to publish, but the router configuration says "persist that topic"?
- then, what authorization rules apply when accessing the history?
- does it make sense to disallow subscribing, but allow accessing history?
- how does it interact with bw-listing?
Per-even persistence options
Ok, this is sth I would really think carefully about, as it has deep implications. The main point is: do we want to give control to the client over persistence?
Knobs that are under router control are a different thing .. but those are probably more a router implementation detail. Eg crossbar allows to define a history max size, but currently not a max age. of course straightforward to add ..
@oberstet Have we reached a solid decision on whether or not to request history by topic URI or by subscription ID? I think there are some arguments in favor of both, but at this point I am biased toward using subscription ID. The reasons:
- This makes it very clear what events are kept in history: Keep all events that go to that subscription. This makes it much easier to support history for wildcard subscriptions. If requesting history by wildcard URI, then history would need to be kept for all events in order to return history for any matching wildcard URI.
- Crossbar supports event history query by subscription ID, so clients that already use this will be automatically interoperable with other routers that provide event history.
If we agree that event history should be queried by subscription ID, I will submit document PR with this as will as other answers to the "FIXME" section.
request history by topic URI or by subscription ID
so crossbar implements both, as in:
505: @wamp.register('wamp.registration.get')
506- def registration_get(self, registration_id, details=None):
507- """
508- Get registration details.
695: @wamp.register('wamp.registration.match')
696- def registration_match(self, procedure, details=None):
697- """
698- Given a procedure URI, return the registration best matching the procedure.
742: @wamp.register('wamp.registration.lookup')
743- def registration_lookup(self, procedure, options=None, details=None):
744- """
745- Given a procedure URI (and options), return the registration (if any) managing the procedure.
and similar for subscriptions https://gist.github.com/oberstet/12eaa37c7dd937f4b0330be67a92dd03
actually, I don't right know which pieces we've agreed upon, either because we already have merged spec text, or from a discussion on some issue / the mailing list :(
I agree: let's try to nail this ... finally ... and including proper spec text ...
fwiw, here is the raw list of meta API procedures implemented by crossbar (again, not sure if all of that is really agreed / standard / in the spec .. but anyways. that is what crossbar actually implements):
- [X]
wamp.session.list - [X]
wamp.session.count - [X]
wamp.session.get - [X]
wamp.session.add_testament - [X]
wamp.session.flush_testaments - [X]
wamp.session.kill - [X]
wamp.session.kill_by_authid - [X]
wamp.session.kill_by_authrole - [ ]
wamp.registration.remove_callee - [ ]
wamp.subscription.remove_subscriber - [X]
wamp.registration.get - [X]
wamp.subscription.get - [X]
wamp.registration.list - [X]
wamp.subscription.list - [X]
wamp.registration.match - [X]
wamp.subscription.match - [X]
wamp.registration.lookup - [X]
wamp.subscription.lookup - [X]
wamp.registration.list_callees - [X]
wamp.subscription.list_subscribers - [X]
wamp.registration.count_callees - [X]
wamp.subscription.count_subscribers - [ ]
wamp.subscription.get_events
I've added "checkboxes" which we might use to mark "agreed / spec'ed" items ..
Edited by @gammazero: Checked all boxes that I believe are agreed on in the WAMP specification. The three outstanding items are:
-
wamp.registration.remove_callee -
wamp.subscription.remove_subscriberIssue #253 -
wamp.subscription.get_eventsThis issue
Since wamp.subscription.match and wamp.subscription.lookup are already finalized in spec (and implemented in multiple router implementations), these provide a way for a client to get subscription IDs by URI search. Then the actual call to the meta procedure wamp.subscription.get_events or wamp.topic.history.last/since/after can then specify subscription ID.
A more important clarification that the spec needs to make is whether this is history of events that were published to a specific topic, or whether this is history of events that were received on a specific subscription (which could be wildcard). This affects what event history is returned for wildcards that do not have a subscription (for which history is not explicitly kept).
The current spec is not clear about it, but seems to imply the former. However, I think the latter is easier to reason about what should be saved, and what should be expected when retrieving history, and is what Crossbar does -- saves events that are delivered to a specific subscription.
If we get history by subscription ID (implement as per Crossbar), then meta procedure should be called wamp.subscription.get_events, which would be exactly as per Crossbar. wamp.subscription.get_events.since and wamp.subscription.get_events.after would extend this.
If we get history by URI, then we need more decisions/specification about how to get history for wildcard and prefix URIs. If we get history by subscription ID, then this is already defined by what wamp.subscription.list/match/lookup return and whether history is being kept for those session IDs.
So I got to this topic too.
I read all the spec and discussions here and there and have a couple of comments.
In my view operating on topic uri instead of subscription id is more interesting because it is more end-user-oriented. Let me explain more:
For end user only topic uri makes sense, and not subscription id. Yes, we have it in the spec and of course use under the hood, but different implementations exposes different public API, and some of them hides the internals like different IDs. So end user in some cases even don't know the id of topic he/she wants to publish/unsubscribe/etc.
In case of forcing users to get history by subscription id they have to store this id, and take additional steps to resolve uri→id. IDs - are runtime things, but topics are more business-logic side.
Also regarding wildcard/prefix - if we operate on topic uri - it's just the same approach. Treat event history component in your router like a subscriber peer, subscribed to some exact/prefix/wildcard topics. If it matches - store. nop - ignore. Yes, it can be memory intensive, but you can provide a router configuration for this: store last N events for any topic, store M events for specific exact/prefix/wildcard topic.
So for exact topic - it will be a real events history. For pattern/wildcard - yeah, more active topics will supersede events for other topics. But nevertheless, in terms of asking event history for wildcard/prefix topic it will still be a fair last N messages.
So the only confusion will happen when router is configured to store 100 events for prefix/wildcard topic, and the user is asking for event history of exact topic included into this group, but which is not so active as others what results in having 0 events received.
Another point about topic uri VS subscription Id - is that in case of topic uri event history feature can be implemented standalone, in case of subscription id - the router also needs to implement subscription meta API.
So for now we have a topic uri approach written in the spec, and subscription id approach implemented in XB...
Well, regardless of all, we need to synchronize, agree and finalize this.
A more important clarification that the spec needs to make is whether this is history of events that were published to a specific topic, or whether this is history of events that were received on a specific subscription (which could be wildcard). This affects what event history is returned for wildcards that do not have a subscription (for which history is not explicitly kept).
Yeah, in case of going with subscription id we should update the spec with terms accordingly.
@oberstet @gammazero @konsultaner what do you think? :)
Regarding black/white listing:
Well... this still can be applied to active sessions, but in general I find it useless. Event history is mostly for cases when new session connects to router and wants to get some information about what's going on here. So its session id will probably be not listed in any black/white lists. Instead: black/whitelisting based on auth ids auth roles - makes sense. What is black/whitelisting for? To filter some clients from getting access to protected data, right? So we are filtering not the abstract sessions, but rather the clients aka users standing beneath that sessions. So operating on auth attributes I think is more correct. And even if user has 2, 3, 10 active sessions - it still connects to the router using the same credentials.
what do you think? :)
@KSDaemon According to @oberstet's example I disagree with using the topic instead of the subscription_id. But to be honest I don't get the point why we need an optional match? Doesn't the subscription_id logically includes the matching policy? Maybe I'm not getting it right.
@konsultaner Well, seems that you messed it all a bit :)
As @oberstet is on subscription_id side and not on topic one :)
For subscription-based history, we do not need any policy info. Because subscription id has a rather clean meaning: get events that were put into exactly that one subscription. I intentionally do not use term published because it has not have the one-on-one relation with subscription (because of wildcards and prefixes). You publish to a topic, which results in that the same event is putted into 0 or more subscriptions.
A more important clarification that the spec needs to make is whether this is history of events that were published to a specific topic, or whether this is history of events that were received on a specific subscription (which could be wildcard). This affects what event history is returned for wildcards that do not have a subscription (for which history is not explicitly kept).
Yeah, in case of going with subscription id we should update the spec with terms accordingly.
I've read the thread, and from this, and from what @konsultaner said, it's my impression that we agree on making/keeping subscription_id (and registration_id) for event (and call) history.`?
fwiw, should that be the case, and should the spec text needs a resync, of course I'm +1 on that!
also, this is also how it's currently implemented in crossbar, here is the interface
https://github.com/crossbario/crossbar/blob/ccbae0a56ef331fe00192455837aa72578eadc54/crossbar/interfaces.py#L251
and an in-memory store
https://github.com/crossbario/crossbar/blob/ccbae0a56ef331fe00192455837aa72578eadc54/crossbar/router/realmstore.py
and an LMDB database store
https://github.com/crossbario/crossbar/blob/ccbae0a56ef331fe00192455837aa72578eadc54/crossbar/edge/worker/realmstore.py
which uses this schema
https://github.com/crossbario/cfxdb/blob/master/cfxdb/realmstore.fbs
and python bindings
https://github.com/crossbario/cfxdb/tree/master/cfxdb/realmstore
for easier reading, and to provide an additional argument of why subscription/registration ID is a cleaner approach making such IDs the "primitive type" to use (rather than URIs and match policies) is the database schema crossbar uses to store events
/// This table store WAMP events dispatched to receivers, under WAMP subscriptions on URIs (or patterns).
table Event
{
/// Timestamp when the event was sent to the receiver. Epoch time in ns.
timestamp: uint64;
/// The subscription ID this event is dispatched under.
subscription: uint64;
/// The publication ID of the dispatched event.
publication: uint64;
/// The WAMP session ID of the receiver.
receiver: uint64;
/// Whether the message was retained by the broker on the topic, rather than just published.
retained: bool;
/// Whether this Event was to be acknowledged by the receiver.
acknowledged_delivery: bool;
}
if there wasn't a subscription ID already present in the event, that table would need to invent one so that the DB can store multiple events with the subscription metadata such as URI and match policy normalized into a separate table ...
one more note rgd the database schema .. I just recognized, what's available in crossbar (via cfxdb) is "correct" (from a data modeling perspective), but it is incomplete:
When a publisher publishes an event, that event including payload will be stored as a Publication record
https://github.com/crossbario/cfxdb/blob/b8f301c6f2cb7dcafbcc58a12ef442d70625a1a6/cfxdb/realmstore.fbs#L122
This single published event can then match one or more subscriptions, which are configured according to
https://github.com/crossbario/cfxdb/blob/b8f301c6f2cb7dcafbcc58a12ef442d70625a1a6/cfxdb/arealm.fbs#L68
However, a run-time persisted record for a subscription is missing. Such a record would come into existence once a first client has subscribed.
Then the router will match the published event to all active subscriptions, and dispatches the event to each subscribing client on these subscription.
Each dispatched event will persist a separate database record in Event
https://github.com/crossbario/cfxdb/blob/b8f301c6f2cb7dcafbcc58a12ef442d70625a1a6/cfxdb/arealm.fbs#L104
IOW: the crossbar database model currently misses a Subscription table. This could be easily added, and then filled in
https://github.com/crossbario/crossbar/blob/ccbae0a56ef331fe00192455837aa72578eadc54/crossbar/router/broker.py#L866
by implementing
https://github.com/crossbario/crossbar/blob/ccbae0a56ef331fe00192455837aa72578eadc54/crossbar/interfaces.py#L207
Read spec, this thread, related crossbar docs once again and still have no final thoughts how it should be done %))
So, here is a new portion from my side:
We are mostly looking to subscription_id based approach. Okay. It's implemented in XB. Cool. But even XB config is based on URI, and not on subscription_IDs! %) Of course it's hard to name in advance the subscription_ids of the topics, as they are generated in runtime first. But having uris in config rises the same questions again: what if one topic matches more than one subscription? So the router has to save all events published to the topic because it can not know what subscriptions may be created in the future, right? (that is just what Tobias described above)
Another coin to topic-based history is the next example: Let's say we have a bunch of services that publish log events to some topic. And there is one subscriber service that process them. Let's imagine a situation when this subscriber service starts for some reason later. So all publishers began to publish events on this topic. In common cases (without events history) these events will be just thrown away as there are no subscribers! Having the events history feature turned on the router has to save publications to topic URI in any way, and not just only events because when the first peer subscribes to that event history enabled topic and asks for history - the router has to replay all those publications and transform them into → events and deliver as a result of history request.
So even if event history API is based on subscription id, the router has to store publications. Operating on topics and publications under the hood may simplify the router logic. Router can store only publications and in case of receiving event history request - just replay stored publications. So why not to create an API based on topics? In this case router may store only publications and transform them into events on request.
And for me personally looking at event history from the publications side seems to be more straight forward: it doesn't matter how many subscriptions are there: exactly same uri, a few prefix or wildcard based subs. The original publication is only ONE. That's it. Even if it can fall into a few pattern-based subscriptions under the hood.
And from the implementation side: the router receives a publication, then just like finding matching subscriptions, it can find matching configurations for storing event history, then it stores if needed the original publication, processes, and sends events as usual. That's it.
@KSDaemon thanks for your thoughts and comments! interesting .. many different aspects and perspectives;) fwiw, pls let me add some specific notes in reply further down below.
I'm not sure this whole issue still makes sense anyways: what's the goal?
Of the current router implementations, those that provide any kind of persistency are 2-3? Not even sure which ones. Those will behave differently today. And that's probably fine.
In any case: if we really wanted to practically achieve interop rgd history API for different routers, much more would be needed (API DSL, automated test cases, ..) which seems unrealistic given the lack fo resources/interest ..
But even XB config is based on URI, and not on subscription_IDs!
It's based on URI and match policy, but yeah, any ID used internally is just that: internal, not exposed in the config.
The run-time is based on (internal) IDs, and since the history API also uses IDs, there are 2 helpers as well
https://github.com/crossbario/crossbar/blob/master/crossbar/router/service.py#L850 https://github.com/crossbario/crossbar/blob/master/crossbar/router/service.py#L798
what if one topic matches more than one subscription?
an event matches a given "topic + match policy". such a pair exists in configuration (in crossbar at least) with some ID, and can exist at run-time with some other ID for the run-time subscription (when there is at least one subscriber).
a subscriber that is subscribed on multiple subscriptions that match one given event will receive an event multiple times with different subscription IDs. I think we had some spec text regarding multiple matching subscriptions .. not sure.
So even if event history API is based on subscription id, the router has to store publications.
Yes, which is why crossbar allows to configure persistency also based on the same topic/match-policy tuple
https://github.com/crossbario/crossbar-examples/blob/cfe7d866b2872eebdce3de5d311c6894c2868809/event-history/.crossbar/config.json#L44
The original publication is only ONE.
yes, which is why there is a PUBLICATION table
then it stores if needed the original publication, processes, and sends events as usual
no, the events dispatched under subscriptions in the past (the history) depended on the configuration in the past, and actual availability of subscribers in the past, it cannot be simply recomputed today (after the fact), so consequently there is an EVENT table as well. that is: there are 2 tables! IMO, throwing away information is not more straight forward;) it often appears so, but is always a modeling bug coming back to bite ... at the latest, when data analysts look at the data. my world is quite simple: every bit of original info is stored and there is no "delete" in the database. and in this specific case, there is a 1:n relation between PUBLICATION and EVENT. actually, it has to be, since WAMP has 2 roles related to PubSub with the router in between, so there needs to be 2 tables. Similar with RPC: it would need a CALL/RESULT and an INVOCATION/YIELD table pairs.
Hey folks! Everyone who is participating in this discussion is welcome to PR https://github.com/wamp-proto/wamp-proto/pull/428. I tried to accumulate and order all discussions into specification text.
Closing it via #428 PR