fiware-cygnus
fiware-cygnus copied to clipboard
Cygnus merges notifications when receiving a large volume of them in a short period of time.
Today I've been hit with what seems to be a weird bug in Cygnus. A little background first:
We have a device that instead of sending the measurements as they are collected, it accumulates them (every ~10 minutes, including the time of the measurement) and then delivers all those measurements in a batch at specific points in time (every several hours). This batches are sent to the iotagent-json, that updates the matching entity in Orion and a notification is sent to Cygnus for every measurement of the batch. So far, so good. Cygnus is configured with PostgreSQL and Hadoop as backends. Orion notifications have been configured with no throttling, as we want to store every measurement.
For the PostgreSQL backend we have configured:
cygnus-ngsi.sinks.postgresql-sink.attr_persistence = row
cygnus-ngsi.sinks.postgresql-sink.data_model = dm-by-service-path
As one of the attributes is the date of the measurement, we want to obtain all the attributes in the database that were measured at that date. Then I thought that getting all the attributes that matches the recvtimets of the date attribute would give me that. Then i saw something like this:
postgres=# select * from test.sensors where recvtimets='1529406607326' and attrname='measurement_date';
recvtimets | recvtime | fiwareservicepath | entityid | entitytype | attrname | attrtype | attrvalue | attrmd
---------------+--------------------------+-------------------+----------+------------+------------------+----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1529406607326 | 2018-06-19T11:10:07.326Z | /sensors | sensor01 | mysensor | measurement_date | string | 1529385000000 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-19T11:10:05.451Z"},{"name":"formato","type":"string","value":"epoch"},{"name":"nombre","type":"string","value":"fecha/hora"},{"name":"unidad","type":"string","value":"ms"}]
1529406607326 | 2018-06-19T11:10:07.326Z | /sensors | sensor01 | mysensor | measurement_date | string | 1529377200000 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-19T11:10:05.473Z"},{"name":"formato","type":"string","value":"epoch"},{"name":"nombre","type":"string","value":"fecha/hora"},{"name":"unidad","type":"string","value":"ms"}]
1529406607326 | 2018-06-19T11:10:07.326Z | /sensors | sensor01 | mysensor | measurement_date | string | 1529386800000 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-19T11:10:05.463Z"},{"name":"formato","type":"string","value":"epoch"},{"name":"nombre","type":"string","value":"fecha/hora"},{"name":"unidad","type":"string","value":"ms"}]
(3 rows)
Three different measurements had the same recvtimets. First I thought it had something to do with the iotagent, but every measurement was sent independently and the log shows no errors. Checking Orion, everything was ok too. Then I checked the log for Cygnus and there were one notification received for each of the measurements, so ok too. Checking the data stored in the Hadoop backend, it shows the same problem.
So somewhere between Cygnus receiving the notifications and Cygnus storing them in the backend, it merges some of them as if they were received as a single notification.
I've been able to reproduce this behaviour with the following setup:
- Mosquitto MQTT Broker
- Iotagent-json
- Orion Context Broker
- Cygnus
- PostgreSQL
Using the sensor from the step by step guide I then sent temperature measurements with:
for i in $(seq 1 200) ; do mosquitto_pub -t /1234/sensor01/attrs -m "{\"t\": $i}" -u admin -P admin ; done
And checking the database, some notifications have been merged:
postgres=# select * from myhome.environment where recvtimets='1529934582282';
recvtimets | recvtime | fiwareservicepath | entityid | entitytype | attrname | attrtype | attrvalue | attrmd
---------------+--------------------------+-------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------
1529934582282 | 2018-06-25T13:49:42.282Z | /environment | LivingRoomSensor | multiSensor | Temperature | celsius | 177 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-25T13:49:41.913Z"}]
1529934582282 | 2018-06-25T13:49:42.282Z | /environment | LivingRoomSensor | multiSensor | Temperature | celsius | 169 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-25T13:49:41.856Z"}]
(2 rows)
I've been searching, but have been unable to find anything related to this bug.
Thanks.
Regards.
Cygnus by default works in batch mode. You can try to reduce batch size (CYGNUS_POSTGRESQL_BATCH_SIZE) or decrease timeout (CYGNUS_POSTGRESQL_BATCH_TIMEOUT) in order to that different measures would be stored with different recvtimet.
I think the recvtimets is generated when converting the notification into a NGSIEvent object and not at the sink batch processing stage, before putting it into the channel. Anyway, I tried your suggestion and there was no difference in the results, but setting the timeout to 0 seems to disable the batch processing.
Thanks.
Regards.
Could you try using MySQL as storage backend instead of PostgreSQL? Just to check if the behaviour is the same (in which case, maybe the problem is in the common event handler) or if in that case is correct (in which case, the problem should be in the PostgreSQL sink).
I'm also using the HDFS backend and the same problem is present there, with the exact same values. Here's an example:
- PostgreSQL
cygnus=# select * from testservice.env where recvtimets = '1530167672972';
recvtimets | recvtime | fiwareservicepath | entityid | entitytype | attrname | attrtype | attrvalue | attrmd
---------------+--------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------
1530167672972 | 2018-06-28T06:34:32.972Z | /environment/sensors | LivingRoomSensor | multiSensor | Temperature | celsius | 175 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T06:34:32.776Z"}]
1530167672972 | 2018-06-28T06:34:32.972Z | /environment/sensors | LivingRoomSensor | multiSensor | Temperature | celsius | 181 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T06:34:32.778Z"}]
(2 rows)
- HDFS
root@hadoop-namenode:/# hdfs dfs -ls -R /
drwxr-xr-x - root supergroup 0 2018-06-28 06:34 /user
drwxr-xr-x - root supergroup 0 2018-06-28 06:34 /user/root
drwxr-xr-x - root supergroup 0 2018-06-28 06:34 /user/root/myhome
drwxr-xr-x - root supergroup 0 2018-06-28 06:34 /user/root/myhome/environment
drwxr-xr-x - root supergroup 0 2018-06-28 06:34 /user/root/myhome/environment/sensors
drwxr-xr-x - root supergroup 0 2018-06-28 06:34 /user/root/myhome/environment/sensors/LivingRoomSensor_multiSensor
-rwxr-xr-x 2 root supergroup 51425 2018-06-28 06:35 /user/root/myhome/environment/sensors/LivingRoomSensor_multiSensor/LivingRoomSensor_multiSensor.txt
root@hadoop-namenode:/# hdfs dfs -cat /user/root/myhome/environment/sensors/LivingRoomSensor_multiSensor/LivingRoomSensor_multiSensor.txt | grep '2018-06-28T06:34:32.972Z'
{"recvTime":"2018-06-28T06:34:32.972Z","fiwareServicePath":"/environment/sensors","entityId":"LivingRoomSensor","entityType":"multiSensor", "Temperature":"175", "Temperature_md":[{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T06:34:32.776Z"}]}
{"recvTime":"2018-06-28T06:34:32.972Z","fiwareServicePath":"/environment/sensors","entityId":"LivingRoomSensor","entityType":"multiSensor", "Temperature":"181", "Temperature_md":[{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T06:34:32.778Z"}]}
I could try the MySQL backend later if that's still needed.
Having confirmed the save behaviour in the HDFS storage backend, probably makes the additional test on MySQL unnecessary. Although it is also true that the most situations in which the issues is verified, the better. In summary, as you prefer :)
I've tested it, just in case. Same results:
mysql> select * from environment_sensors_LivingRoomSensor_multiSensor where recvtimets = '1530197123903';
+---------------+-------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------+
| recvTimeTs | recvTime | fiwareServicePath | entityId | entityType | attrName | attrType | attrValue | attrMd |
+---------------+-------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------+
| 1530197123903 | 2018-06-28T14:45:23.903 | /environment/sensors | LivingRoomSensor | multiSensor | Temperature | celsius | 14 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T14:45:23.722Z"}] |
| 1530197123903 | 2018-06-28T14:45:23.903 | /environment/sensors | LivingRoomSensor | multiSensor | Temperature | celsius | 10 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T14:45:23.721Z"}] |
+---------------+-------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
PostgreSQL and HDFS have the same values.
Which Cygnus version are you using? The one comming from master branch?
I'm using the latest docker image available in docker hub (this build):
$ docker images --no-trunc | grep cygnus | grep latest
fiware/cygnus-ngsi latest sha256:7b834ce9560d712a200d08a1c9e05962e5fb6f9b1d1926a1765816550cc38994 39 hours ago 516MB
From the container log:
$ docker-compose logs cygnus | grep version
cygnus_1 | time=2018-06-29T07:14:27.816Z | lvl=INFO | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | comp=N/A | op=main | msg=com.telefonica.iot.cygnus.nodes.CygnusApplication[171] : Starting Cygnus, version 1.9.0_SNAPSHOT.5e63327cecadb2dfc650b83ca3c7a60ae5968bb9
I've also been hit by this bug. I'm using HDFS sink & sending two entities with different id & type to Orion to which Cygnus is subscribed. But Cygnus is merging both notifications & storing the data in the same text file in hdfs. I'm using Cygnus V1.10.0