MQTT.js icon indicating copy to clipboard operation
MQTT.js copied to clipboard

Architectural improvements

Open robertsLando opened this issue 6 months ago • 14 comments

Continuation of discussion: https://github.com/mqttjs/MQTT.js/discussions/2000#discussioncomment-13390011

Below are the key points discussed:

  1. Key Goals Mentioned:

    • Zero external dependencies.
    • Improved browser-friendliness, particularly better bundling.
    • Support for various carriers, including sockets, websockets, webstreams, websocket streams, and web transport.
  2. Transport Plugins:

    • @robertsLando suggests making transport plugins tree-shakable by having users register plugins outside the core MQTT.js.
  3. Support for MQTTv5:

    • While @seriousme expresses concerns about MQTTv5's architectural design, @robertsLando confirms that MQTT.js already supports it and will continue to do so.
  4. Code Performance and Maintainability:

    • @seriousme prioritizes maintainability over performance but notes the importance of lightweight and widespread adoption of MQTT.
    • @robertsLando emphasizes maintaining or improving current performance levels while balancing with maintainability.
  5. Contributions from Opifex:

    • @seriousme offers to contribute code from the Opifex project, such as packet encoding/decoding and a socket-like interface for webstreams.
    • @robertsLando mentions the importance of ensuring compatibility and running tests if integrating these contributions.
  6. Next Steps:

    • @seriousme expresses interest in adding MQTTv5 client support and potentially browser support to Opifex.

@seriousme https://github.com/mqttjs/MQTT.js/discussions/2000#discussioncomment-13398834

I have put all benchmarking code into https://github.com/seriousme/mqtt-packet/tree/benchmark-publish-packet The script that creates the report is: https://github.com/seriousme/mqtt-packet/blob/benchmark-publish-packet/benchmarks/generateAll.js

It uses:

  • mqtt-packet
  • Opifex (https://github.com/seriousme/opifex)
  • pubPacket which uses new Uint8Array() to allocate and TextEncoder.encode() to encode strings (https://github.com/seriousme/mqtt-packet/blob/benchmark-publish-packet/benchmarks/raw/pubPacket.js)
  • pubPacketBuffer which uses Buffer.allocUnsafe() to allocate and Buffer.write() to encode strings (https://github.com/seriousme/mqtt-packet/blob/benchmark-publish-packet/benchmarks/raw/pubPacketBuffer.js) I think that we both agree that for high speed data processing:

@seriousme https://github.com/mqttjs/MQTT.js/discussions/2000#discussioncomment-13402104

I think that we both agree that for high speed data processing:

  1. memory allocations are expensive operations that should be avoided (hence pubPacketBuf only allocates once)
  2. data should be copied as little as possible
  3. list with pointers to buffers (either BL or the custom one in Opifex) help in archieving goals 1 and 2

What I learned from the whole experience is that:

  1. TextEncoder.encode() is a relatively expensive operation on NodeJS . I expected some impact, but the difference with Buffer.write is seriously big and other people have questions about its performance as well.
  2. Uint8Array.subarray() is also quite expensive operation when I expected it to be relatively cheap pointer recalculation only.
  3. Manual inlining code can still outrun automatic optimizations by the V8 compiler.

@seriousme

It seems like more people are complaining about TextEncoder/TextDecoder performance: https://github.com/whatwg/encoding/issues/343 I haven't found yet why subarray() is so expensive as at first glance I would only expect some simple pointer recalculation.

robertsLando avatar Jun 09 '25 12:06 robertsLando

@mcollina suggest to replace bl with a more performant version:

mqtt-packet does not return a Buffer or Uint8array, only an the BL object which has not been flattened yet.

This is the key point of how it's structured. There is absolutely no need to flatten it.

Note that bl got really slower over the years (it was faster) and we had to reimplement it recently for another project: https://github.com/platformatic/kafka/blob/main/src/protocol/dynamic-buffer.ts.

We plan to ship it as a new module soon.

robertsLando avatar Jun 09 '25 12:06 robertsLando

So I was not part of the original discussion but @robertsLando pointed me here.

I just want to say that reading this:

Zero external dependencies.

Makes me very happy. The dependency-free approach really helps the entire community.

SmashingQuasar avatar Jul 08 '25 13:07 SmashingQuasar

Zero external dependencies.

@robertsLando suggests making transport plugins tree-shakable by having users register plugins outside the core MQTT.js.

I'm also extremely interested in the combination of these two. When used on the backend with the default transport there's no reason why I should (indirectly) depend on readable-stream, ws, buffer and so on. I was planning on allocating a few hours to see whether I could craft a patchset for a "lite" version of the mqtt package for this exact reason but then found this issue.

jacoscaz avatar Aug 13 '25 13:08 jacoscaz

@jacoscaz feel free to submit a PR and/or a summary of the implementation you have in mind.

BTW I think the process to improve MQTTjs should start from mqtt-packet.

What I have in mind is to firstly convert it to TS then think about abstracting the stream layer so we could use web streams on browser and native nodejs streams in nodejs. Plus could be interesting to also use buffer on nodejs and uint8 arrays on browser so we can drop buffers polifilly as well.

Once this is done we can port the changes on MQTTjs an improve tree shaking

robertsLando avatar Aug 15 '25 23:08 robertsLando

FYI: maybe the following can help in the discussion:

Opifex uses the concept of a MQTT-conn which is a kind of socket that produces and consumes MQTT packet objects. The client and server create their own net/tls/etc sockets , and pass that to the packet parser/generator. This results in the MQTT-conn.

The current abstraction is webstreams with a shim to facilitate node-streams.This increases portability but webstreams are a bit slower on nodejs. Opifex uses iterators and generators as low as possible in the stack and it is also possibleto use them on Nodejs streams.

MQTT-packet of Opifex contains a pull parser. I.e. MQTT-conn reads packet length then reads the whole packet into a uint8array and passes this for the parser to process. This way of parsing has a few advantages:

  1. its a clear separation of concerns, all the transport stuff stays in the network layer and is not mixed with parsing, its very easy to reason about parsing as its just a uint8array and we don't need to bother with connections pausing/resuming/closing during parsing etc
  2. it only needs a few socket reads to get the whole packet.
  3. its easy to layer on top of anything as it just needs the uint8array
  4. its about 3x as fast as the current mqtt-packet parser.
  5. it does not produce events , it just parses the uint8array on request and throws errors if parsing failes, no race conditions etc.

The generator of Opifex MQTT-packet generates a list of bytearrays and assembels these in a single Uint8Array that can be written to the underlying transport. Same philosophy here as with the parser. The generator should not need to concern itself with the transport layer. (dying/choking clients etc). And it can be layered on top of anything that accepts Uint8Arrays.

Opifex MQTT conn can be found here https://github.com/seriousme/opifex/blob/main/mqttConn/mqttConn.ts

Opifex MQTT packet can be found here https://github.com/seriousme/opifex/blob/main/mqttPacket/mod.ts

I am currently working on adding MQTT V5 support to Opifex MQTT-packet ( its been quite a challenge to correctly type MQTT v5 properties on the various packets using generics, but it seems to be working :wink:) Once that is done I am happy to use that experience to help MQTTjs/mqtt-packet to migrate to Typescript. Mind you: I'm not suggesting to replace MQTTjs/mqtt-packet by Opifex MQTT-packet, but we might leverage the knowledge gained.

Hope this helps.

Kind regards, Hans

seriousme avatar Aug 16 '25 05:08 seriousme

Some sparse thoughts:

MQTT-packet of Opifex contains a pull parser. I.e. MQTT-conn reads packet length then reads the whole packet into a uint8array and passes this for the parser to process.

If I understand correctly, this sounds very similar to what I had in mind. A separate parser/client with a streaming interface that is as abstracted as possible in order to be usable across different stream implementations (node streams, web streams, async iterators, ...). This would then be built upon by transport-specific clients, producing a set of transport-specific packages (mqtt-ws, mqtt-ali, ...) all depending on one mqtt-core package.

The current abstraction is webstreams with a shim to facilitate node-streams.This increases portability but webstreams are a bit slower on nodejs.

Anecdotally and when it comes to CPU-bound stream processing, I've had very good experiences with the asynciterator package, which I use in quadstore to parse database records into JS object at more than 1.5M record/second and which proved significantly faster than every other stream implementation I tested.

BTW I think the process to improve MQTTjs should start from mqtt-packet.

As I wanted to get a feel of how gargantuan of a task this might be and also because I needed something quick for the next few weeks, I went ahead and forked this package into @jacoscaz/mqtt-smol, aiming to decrease the number of total dependencies as much as possible (my use case requires only the mqtt and mqtts protocols).

This required forking mqtt-packet (@jacoscaz/mqtt-packet-smol), which in turn required forking the bl package (@jacoscaz/bl-smol).

In doing so I stumbled into https://github.com/rvagg/bl/issues/150 , which highlights a significant difference in how each developer picks a different compromise between isomorphism and dependency count.

I am firmly in the camp that prefers to use tools like webpack and the likes to alias node's native modules to userland implementations (such as aliasing node:stream to readable-stream) when building for browser use, keeping the count of explicit dependencies (including indirect) to a minimum. Others prefer to explicitly depend on userland implementations, making life easier for both library authors and consumers at the cost of (potentially mismatched and/or duplicated) unnecessary dependencies when running server-side. There's pros and cons in both approaches but, on a personal level, I would strongly prefer to allocate effort to the former.

Mind you: I'm not suggesting to replace MQTTjs/mqtt-packet by Opifex MQTT-packet, but we might leverage the knowledge gained.

It does feel somewhat wasteful to spread ourselves thin across different MQTT stacks, though. I'm not saying we definitely should not do so but, given the very human propensity for the sunk cost fallacy, we should at least consider whether it would make sense for Opifex (or any other "new" MQTT implementation) to eventually replace MQTT.js or whether it would make sense for Opifex to act as an R&D project whose findings eventually find their way back to MQTT.js.

jacoscaz avatar Aug 16 '25 07:08 jacoscaz

Hi guys, I'm back :)

Thanks both for your comments here.

If I understand correctly, this sounds very similar to what I had in mind. A separate parser/client with a streaming interface that is as abstracted as possible in order to be usable across different stream implementations (node streams, web streams, async iterators, ...). This would then be built upon by transport-specific clients, producing a set of transport-specific packages (mqtt-ws, mqtt-ali, ...) all depending on one mqtt-core package.

This is also what I have in mind. This will allow to take advantage of NodeJS streams on NodeS side and Web streams on Browser side as NodeJS streams are much faster than Web streams in NodeJS environment. Ref: https://github.com/mqttjs/mqtt-packet/issues/151

It does feel somewhat wasteful to spread ourselves thin across different MQTT stacks, though. I'm not saying we definitely should not do so but, given the very human propensity for the sunk cost fallacy, we should at least consider whether it would make sense for Opifex (or any other "new" MQTT implementation) to eventually replace MQTT.js or whether it would make sense for Opifex to act as an R&D project whose findings eventually find their way back to MQTT.js.

IMO we should not try to replace MQTT.js. MQTT.js is a great library, has a lot of users and is battle-tested, asking users to switch would be a tough sell. I think the best way is to take all the learnings from Opifex and contribute back to MQTT.js.

robertsLando avatar Aug 27 '25 09:08 robertsLando

I might be able to take a stab at a proof of concept for this in a couple of weeks.

jacoscaz avatar Aug 27 '25 09:08 jacoscaz

FYI: to get a hint of where Opifex is going with MQTTv5 see: https://github.com/seriousme/opifex/tree/mqttv5/mqttPacket e.g.: https://github.com/seriousme/opifex/blob/mqttv5/mqttPacket/connect.ts I have about 8 more packets to upgrade and then it should support 3,4 and 5 at packet level.

Adding client level support seems rather straigthforward, adding full server support is a different topic!

seriousme avatar Aug 27 '25 17:08 seriousme

FYI: Small update: I think I have all of MQTTv5 in Opifex/mqttPacket. Server and client work with the updated packet format, but the protocol logic still only supports V4. I am now trying to port the tests of MQTTjs/mqtt-packet to see if I missed something. Typing of properties took quite some time as I wanted a single place to define them. The decoder verifies if properties actually belong to the packet type. The encoder will only encode properties that belong to the packet type. Same with reason codes.

seriousme avatar Sep 05 '25 15:09 seriousme

FYI: Another small update.

I have merged mqtt v5 support into Opifex main and have tested it against the tests in https://github.com/mqttjs/mqtt-packet/blob/master/test.js

I got all of the tests that were mappable working and even found a few nits in Opifex (e.g. checking if packetflags were empty, even when we don 't use them).

Some tests do not make sense in Opifex context (e.g. split packet tests) as the Opifex parser is only started once a full packet has been received. Opifex MQTTconn waits until it has enough data in to start parsing, which simplifies parsing and speeds it up as well.

As for mapping, the differences are in:

Naming

 MQTT standard Opifex MQTT-packet
packet type type cmd
message id id messageId
keep alive keepAlive keepalive
protocol name protocolName protocolID
protocol level protocolLevel protocolVersion
topic filter (on subscribe/unsub) topicFilter topic
no local (on subscribe) noLocal nl
retain handling (on subscribe) retainHandling rh
retain as published (on subscribe) retainAsPublished rap
reason code (sub ack/ unsub ack reasonCode granted
property: maximum qos maximumQos maximumQoS
property: subscription identifiers available subscriptionIdentifiersAvailable subscriptionIdentifierAvailable
property: user property userProperty userProperties

Field format

Item Opifex MQTT-packet
packetType/cmd number string
ping response Packettype.pingres "pingresp"
username string string | buffer
password Uint8Array string | buffer
subscriptionIdentifiers Array < number > Array < number > | number
userProperty Array<[string,string]> Record < string, string | Array < string > >

Opifex does not:

  • implement bridgeMode on Connect as I could not find any formal reference to it
  • return dup, retain and qos attributes on packets that don't use them.
  • return length as this is already known
  • emit an 'error' event, it throws an Error instead which makes more sense in Opifex context.

Finally,due to the way V5 properties are handled Opifex will always use the same order of encoding properties which differs from the order that MQTT-packet uses. So binary formats with properties contain the same data but not always in the same order. I tested this by decoding the MQTT-packet binary format and comparing it to the provided Object, then encoding and decoding again using opifex and checking if the same object was returned.

If you are interested one could clone Opifex/mqttPacket, and adjust its behaviour to match that of MQTT-packet using search/replace of field names and changing the field formats.

You could then wrap the opifex decoder in a parser which behaves like the MQTT parser ,i.e it accepts chunks, buffers them and either emits a packet or an error. The parser could accept Uint8Arrays and Nodejs. buffers depending on the environment (Browser or NodeJS. The encoder already uses a Uint8array list which now assembles it into one Uint8Array, but it could also be used to feed a write stream if preferred. Conversion from cmd (string) to packetType and vice versa could also be performed by the wrapper.

Like I mentioned before: I have no horse in this race. Just trying to help! Enjoy!

Kind regards, Hans

seriousme avatar Sep 20 '25 13:09 seriousme

@seriousme About bridge mode there is no official docs about that but unofficial ones (actual implementation is the same used by mosquito and all other brokers supporting mqtt bridge)

robertsLando avatar Sep 23 '25 16:09 robertsLando

@robertsLando

@seriousme About bridge mode there is no official docs about that but unofficial ones (actual implementation is the same used by mosquito and all other brokers supporting mqtt bridge)

Makes sense! Opifex mqttPacket on main now supports bridgeMode

Kind regards, Hans

seriousme avatar Sep 23 '25 18:09 seriousme

I might be able to take a stab at a proof of concept for this in a couple of weeks.

Quick follow-up on this: based on the progress made on Opifex over the last few days and @seriousme's super-quick turnarounds when it comes to reviewing issues and PRs, I think it might be better for me to build upon Opifex rather than work on refactoring MQTT.js.

My preference is mainly due to the fact that dependency count (incl. indirect dependencies) is extremely important to me. I want my apps to have as small of an attack surface as possible when it comes to supply chain attacks and I want to be able to manually review, inspect and even contribute to all of their dependencies, something which becomes really hard when an app depends on more than a couple dozens of packages. Compatibility with other JS runtimes is also something I'm quite keen on.

Therefore, I would rather focus on libraries that natively align with these priorities. Forcing them on well-established libraries such as MQTT.js, which were developed with different guiding principles, would take a lot of time, a lot of discussion and an enormous effort in ensuring backward compatibility.

jacoscaz avatar Oct 04 '25 09:10 jacoscaz