erddap Add MQTT support to ERDDAP

MQTT is a system for efficiently transmitting real time data to/from sensors. https://en.wikipedia.org/wiki/MQTT

Since ERDDAP has datasets with near real time data (e.g., in data files which change periodically), it would be nice to add an MQTT real-time data service to ERDDAP. This would allow suitable EDDTableFromFiles datasets to be made available as publishers so that MQTT subscribers could be quickly and efficiently notified when there is new data. Much of this task would be to create a way for ERDDAP administrators to identify suitable datasets and to have the datasets be able to identify/cache-in-memory the most recent row of data so that it is available for the MQTT publishing. Note that 1 dataset may represent 100's of stations/sensors, e.g., https://coastwatch.pfeg.noaa.gov/erddap/files/cwwcNDBCMet/nrt/ Subscribers should be able to subscribe to one, some, or all stations in a dataset.
Make EDDTableFromHttpGet (which is the only type of dataset in ERDDAP that supports data ingest) into an MQTT subscriber. Then an EDDTableFromHttpGet dataset could connect to a sensor which is acting as MQTT publisher in order to gather data from the sensor.

ERDDAP is a Java servlet, so the solution needs to be cross-platform. Existing libraries to be used: Paho from Eclipse? An MQTT broker (e.g., Mosquitto)?

Possible people to talk with about needs: Someone in IOOS(?) has Sensor Data Ingest Project.

Skills required: Java programming. I think you can learn about MQTT while working on the project.

Difficulty: Difficult. This could easily take a couple of months.

Mentor: Bob Simons (main author of ERDDAP)

Please also read the Programmer's Guide at https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#programmersGuide especially the "Judging Your Code Contributions" section.

Feb 01 '21 23:02 BobSimons

@mwengren is the ringleader of the Sensor Data Ingest project, and yes MQTT is going to be at the core of what we are scheming.

ERDDAP probably shouldn't try to integrate a broker, but it would be nice if it could support connecting to a variety of external ones (with various auth mechanisms).

One thing to ponder is how should ERDDAP template MQTT topics and messages, or if it will use a more fixed format?

For 1: ERDDAP should probably only publish new data on every refresh, thus it should be possible without adding a cache as only a momentary diff will be needed. The MQTT broker is in charge of making sure anything published to a topic is available (or not) for any late subscribers.

Feb 02 '21 00:02 abkfenris

Morning @BobSimons,

This looks super interesting, can I ask if it’s possible that;

Make EDDTableFromHttpGet (which is the only type of dataset in ERDDAP that supports data ingest) into an MQTT subscriber. Then an EDDTableFromHttpGet dataset could connect to a sensor which is acting as MQTT publisher in order to gather data from the sensor.

Is split into two pieces, such that EDDTableFromHttpGet can optionally be or not be set as a subscriber and a new EDDTableFromMQTT is design/coded for 'pure' message based delivery.

This is something that the National Oceanography Centre have thought about during a pilot project around coastal hazard warnings, which is currently using the HttpGet method for sensor data delivery. https://digitalenvironment.org/a-creamt-that-delivers-health-checked-hazard-warnings/

Feb 02 '21 11:02 thogar-computer

@thogar-computer Most of EDDTableFromHttpGet's code deals with how to store the data sent to it (in different files based on administrator settings and the data itself (which sensor, what time)) and how to keep the internal database with information about each file up-to-date, so I thought it made sense to add optional MQTT system which simply feeds the incoming data into the existing data input system.

I think there are advantages to putting the MQTT support into the existing EDDTableFromHttpGet, not a separate class. Notably, that way, a sensor could be providing the raw data via MQTT and other editors (human or software) could be providing changes via HttpGet.

Feb 02 '21 16:02 BobSimons

@abkfenris, you asked about payload formats. ERDDAP now uses JEXL, which can parse expressions or scripts. I think JEXL can be used to solve the problem.

So for MQTT data ingest with EDDTableFromHttpGet, it seems like a good idea to consider having the ERDDAP administrator specify a bit of JEXL code which would be responsible for converting the MQTT message (in whatever format it is in) into the format that ERDDAP wants.

I'm less convinced about this: And for MQTT data publishing, there could be code which takes the data from ERDDAP (in standard format or data structure) and formats an MQTT payload. The disadvantage is: then, subscribing to an ERDDAP MQTT publisher wouldn't lead to consistently formatted data. Maybe it is better to have ERDDAP just publish in one format and leave it to external systems to reformat that as needed. I don't know enough about MQTT payloads (are there standard formats?) and other expectations in the MQTT world.

Feb 02 '21 16:02 BobSimons

@thogar-computer I hereby confirm that the CREAMT project is the first in the world to use EDDTableFromHttpGet. Congratulations!

Feb 02 '21 18:02 BobSimons

I'm not sure whether this is the right place but were looking to use ERDDAP and publishing to a message queue (using AMQP or MQTT) to publish NetCDF formatted data in (near) real time to the WMO Information System (WIS2.0), more information here:

https://github.com/wmo-im/WIS/issues/1

It would be great to see this added to ERDDAP without having to use a middle layer. There have been some discussions over message formats, see:

https://github.com/wmo-im/GTStoWIS2/tree/main/message_format

It might be the case be that the message payloads need to be customizable to the different queues.

Feb 03 '21 10:02 david-i-berry

@DavidBerryNOC, this is a good place for information like that. Thanks. It will be important to design the ERDDAP MQTT support to match the needs of users.

Feb 03 '21 16:02 BobSimons

Hello, I thought I'd follow up to see if there has been any progress on this?

The development of the WMO WIS2.0 architecture is progressing and it looks like it has been settled to use MQTT for the messaging (still subject to approval). I'm also involved in a project developing a reference implementation for a "WIS2 node" that aims to lower the barriers to entry for exchanging data on the WIS2.0 and in WMO formats. As part of the discussions we've been looking at how / whether we could connect to ERDDAP and use that as a data source / feed into the WIS2.0. The potential workflow we've discussed is:

New data available on ERDDAP -> message to WIS2node (via MQTT) -> new data retrieved and converted to BUFR by node -> message to WIS2.0 advertising availability of new BUFR data.

(For info the project can be found at: https://wis2box.readthedocs.io/en/latest/overview.html)

Mar 24 '22 15:03 david-i-berry

Sorry. No progress. This project hasn't been assigned to anyone yet.

Mar 24 '22 15:03 BobSimons

No progress, but there is interest from BODC in collaborating on this. That would be good because it would help ensure that the result is useful and works well.

May 20 '22 18:05 BobSimons

erddap erddap copied to clipboard

Add MQTT support to ERDDAP

erddap
erddap copied to clipboard