RADAR-Backend
RADAR-Backend copied to clipboard
End-to-End Platform Test Requirements
We have suggestions to stand up the platform stack so we can do a demonstrative end-to-end evaluation of: sensor interface --> data ingestion --> processing --> storage --> emission from the REST API --> visualisation in a dashboard (or equivalent client, TransMart etc).
Discussion on schema requirements for this should go here: RADAR-CNS/RADAR-Schemas#3
@blootsvoets has setup some of the Confluent components EC2 which may serve as a means to do this?
So far, the Confluent part is set up, but that is only the data ingestion. Step by step:
- sensor interface: I'm collaborating with @MaximMoinat to define an API for the sensor to submit data to the system. The first draft of this is in RestProducer.java.
- Ingestion: a standard Confluent setup on EC2. Schemas for the Empatica E4 are defined in schema directory. If we decide on some more general schemas (i.e.
battery_level.json
instead ofempatica_e4_battery_level.json
) those could be used instead. - Processing: in BatteryLevelMonitor.java a pilot battery level monitor is used, which should run on the EC2 instance. No schema validation is done, the device schema is assumed.
- Storage: not yet set up, currently everything is stored in Kafka without consuming it to store it elsewhere.
- output REST API: not set up
- visualisation not set up
With docker-compose and Travis, we could try and test at least each individual arrow. I'm inclined to solve the complete end-to-end testing with a monitoring solution (in combination with a test or dev deployment) because of the many moving parts and unknowns. What do you think?
I'm working over the entire pipeline without using devices, I have created a simple data producer and a consumer groups. Next steps are
- fixing some bug in the consumer group
- add the db layer by means of kafka connector
- connect the db layer to the REST API partially implemented in JAVA ontop of Apache TomCat
My setup is running ontop of my laptop inside multiple docker containers. As soon as I have a stable code I will push it in the repository.
I think there are three candidate devices likely to be put forward for the early stage testing in this order.
- Empatica
- Pebble2
- 1 Phone sensor (GPS, accelerometer or similar) The more stationary proposal in Epilepsy pilot and the Pebble for the more ambulatory setting. Given the direction of travel, using Empatica for the first end-to-end test seems clearly the way to go. I'm waiting for confirmation from WP6 on the depression pilot device then I'll add an issue around integration of this device.
Update on Storage
I worked on activating the cold storage for the platform. I integrated an adapted HDFS Connector to write key and value data of messages to HDFS.
As part of validation, a test was performed covering "sensor interface --> data ingestion --> processing --> storage (COLD)", with Empatica E4, Android application on ResPi --> Confluent platform --> HDFS.
A relatively long running (5 to 6 hours) test was performed to actively stream data in real-time and store them to HDFS.
The HDFS Connector integration is comparably straight-forward, can be done by tuning configurations and documented here.
Multiple connectors with different flush.size
values corresponding to sampling frequencies is a good choice to commit single file per topic for certain time interval (e.g. every hour).