ditto
ditto copied to clipboard
Provide API to stream/replay persisted events from the event journal
Ditto applies the "Event Sourcing" paradigm for its persistence (of things, policies and connections):
- https://blog.softwaremill.com/3-reasons-to-adopt-event-sourcing-89cb855453f6
- https://eventuate.io/whyeventsourcing.html
Ditto however does not yet make much use of that persistence paradigm. I would like to propose to provide APIs to be able to retrieve a stream of persisted events from the "event journal". That can be useful e.g. in order to:
- replay events which were "missed" (similar to what Apache Kafka provides)
- query the changes made to a single property value of a thing
- get an "audit log" of changes done to a thing/policy/connection
The API would only be available in a streaming way, so for:
- SSE
- WebSocket via DittoProtocol
- Connections via DittoProtocol
The DittoProtocol API should be very similar to the search protocol.
As the Ditto background cleanup would delete events quite fast, we would also need a configuration of a "event-retention duration" - so it would be possible to configure how long to keep events and snapshots in the DB before finally deleting them via the cleanup. Example: 7 days
An idea how the SSE API could look like:
/api/2/things/org.eclipse.ditto:thing-2?fromRevision=0&fields=thingId,attributes,_revision,_modified
The stream would automatically end when the most recent event was returned.
This API could be provided for things, policies and connections (for connections however probably only available via piggyback commands). The idea would be to use either:
- the _revision number of the entity as "from/to" selector
- or the _modified timestamp of the event journal entry (which is included in the the MongoDB
_id
field)
With the timestamp, a query could look like: An idea how the SSE API could look like (to get all changes within one hour):
/api/2/things/org.eclipse.ditto:thing-2?fromModified=2022-09-29T12:00:00.000Z&toModified=2022-09-29T11:00:00.000Z&fields=thingId,attributes,features,_revision,_modified
Input on whether that would be useful by our community would be much appreciated @BobClaerhout @w4tsn @tobias-zeptio @thlandgraf
Hello, we find that the historical data in ditto is queried in the things-journal table of the mongodb database in docker, but we also find that these historical data will be deleted every fixed time, and only the latest updated data is retained in the database. We want to know, can this fixed time be modified ? If it can be modified, where should it be modified ?
@hjccjugfcc part of this issue is the idea to provide an API to exactly get this information from e.g. the "things_journal". And also to have a configurable "retention time" of how long to keep the events in the database before Ditto cleans it up in the background.
Do you have other requirements/ideas what you would expect from a "historical API access"?
@thjaeckle My partner gave the question asked above. We will describe our idea clearly. At the present, we try to utilize Ditto as IoT digital twin of autonomous vehicles in our project. We can gain the state data of every vehicle in motion such as speed, steering wheel angle…… and meanwhile group all the state to JSON format as a thing and send it directly to Ditto through MQTT protocol. In the process of autonomous driving, every vehicle could be identified by one unique Thing-ID, but the state data may change continuously and new data would overwrite the old frequently. However, for autonomous driving, the historical data can not only help to monitor vehicle movement as reference, but also provide data for autonomous driving algorithm research especially about perceiving, planning and safety guaranteeing. That’s why we need historical data sincerely. We believe that an API to exactly get this information will be useful and hope Ditto can keep the complete historical data in the database at least one week before cleaning it up, which allows us plenty of time to export it.
@CabenJL if you are planning to export the "historical data" anyhow, you could also (already now, without the history API) create a Ditto managed connection to e.g. an "Apache Kafka" broker and configure it to publish all changes which are done to your things.
Whenever the state of a thing (digital twin) is then updated, this "event" will automatically be published to the Kafka broker from where you can build up a data lake or whatever you need to do to store the data.