ditto icon indicating copy to clipboard operation
ditto copied to clipboard

Provide API to stream/replay persisted events from the event journal

Open thjaeckle opened this issue 2 years ago • 1 comments

Ditto applies the "Event Sourcing" paradigm for its persistence (of things, policies and connections):

  • https://blog.softwaremill.com/3-reasons-to-adopt-event-sourcing-89cb855453f6
  • https://eventuate.io/whyeventsourcing.html

Ditto however does not yet make much use of that persistence paradigm. I would like to propose to provide APIs to be able to retrieve a stream of persisted events from the "event journal". That can be useful e.g. in order to:

  • replay events which were "missed" (similar to what Apache Kafka provides)
  • query the changes made to a single property value of a thing
  • get an "audit log" of changes done to a thing/policy/connection

The API would only be available in a streaming way, so for:

  • SSE
  • WebSocket via DittoProtocol
  • Connections via DittoProtocol

The DittoProtocol API should be very similar to the search protocol.

As the Ditto background cleanup would delete events quite fast, we would also need a configuration of a "event-retention duration" - so it would be possible to configure how long to keep events and snapshots in the DB before finally deleting them via the cleanup. Example: 7 days

An idea how the SSE API could look like:

/api/2/things/org.eclipse.ditto:thing-2?fromRevision=0&fields=thingId,attributes,_revision,_modified

The stream would automatically end when the most recent event was returned.

This API could be provided for things, policies and connections (for connections however probably only available via piggyback commands). The idea would be to use either:

  • the _revision number of the entity as "from/to" selector
  • or the _modified timestamp of the event journal entry (which is included in the the MongoDB _id field)

With the timestamp, a query could look like: An idea how the SSE API could look like (to get all changes within one hour):

/api/2/things/org.eclipse.ditto:thing-2?fromModified=2022-09-29T12:00:00.000Z&toModified=2022-09-29T11:00:00.000Z&fields=thingId,attributes,features,_revision,_modified

thjaeckle avatar Sep 29 '22 14:09 thjaeckle

Input on whether that would be useful by our community would be much appreciated @BobClaerhout @w4tsn @tobias-zeptio @thlandgraf

thjaeckle avatar Sep 29 '22 14:09 thjaeckle

Hello, we find that the historical data in ditto is queried in the things-journal table of the mongodb database in docker, but we also find that these historical data will be deleted every fixed time, and only the latest updated data is retained in the database. We want to know, can this fixed time be modified ? If it can be modified, where should it be modified ? d87589495a9a74b6750c8c7fc49ce02 3c1461ac745bf28746bddc246d29132

hjccjugfcc avatar Oct 26 '22 08:10 hjccjugfcc

@hjccjugfcc part of this issue is the idea to provide an API to exactly get this information from e.g. the "things_journal". And also to have a configurable "retention time" of how long to keep the events in the database before Ditto cleans it up in the background.

Do you have other requirements/ideas what you would expect from a "historical API access"?

thjaeckle avatar Oct 26 '22 09:10 thjaeckle

@thjaeckle My partner gave the question asked above. We will describe our idea clearly. At the present, we try to utilize Ditto as IoT digital twin of autonomous vehicles in our project. We can gain the state data of every vehicle in motion such as speed, steering wheel angle…… and meanwhile group all the state to JSON format as a thing and send it directly to Ditto through MQTT protocol. In the process of autonomous driving, every vehicle could be identified by one unique Thing-ID, but the state data may change continuously and new data would overwrite the old frequently. However, for autonomous driving, the historical data can not only help to monitor vehicle movement as reference, but also provide data for autonomous driving algorithm research especially about perceiving, planning and safety guaranteeing. That’s why we need historical data sincerely. We believe that an API to exactly get this information will be useful and hope Ditto can keep the complete historical data in the database at least one week before cleaning it up, which allows us plenty of time to export it.

CabenJL avatar Oct 27 '22 12:10 CabenJL

@CabenJL if you are planning to export the "historical data" anyhow, you could also (already now, without the history API) create a Ditto managed connection to e.g. an "Apache Kafka" broker and configure it to publish all changes which are done to your things.

Whenever the state of a thing (digital twin) is then updated, this "event" will automatically be published to the Kafka broker from where you can build up a data lake or whatever you need to do to store the data.

thjaeckle avatar Oct 27 '22 12:10 thjaeckle