nDPI
nDPI copied to clipboard
Persistent Application Protocol IDs
Some people complain that numeric nDPi protocol IDs are not persistent. Protocols are removed from nDPI when they are obsolete (i.e. not present in modern traffic) as they keep space that will be unused. This is due to the fact that keeping short the list of protcools makes nDPI more efficient and bitmaps are shorter. Some people complain about this, as when protocol X is removed, X can be used by a totally new different protocol.
As some people do not like this, it is a good time to introduce persistent IDs that won't change and won't be recycled. This way every protocol will have a persistent Id that can be used if necessary instead of the current application protocol that in essence will become just an internal id.
@lnslbrty @IvanNardi @aouinizied What do you think?
Internally to nDPI, it might not matter much that application IDs are reused. However nDPI nolonger stands alone: in the SDN world it is now used as part of larger systems. It's crucial that external systems are able to disambiguate applications - so application IDs must be permanent, long-lived, and cannot be re-used.
Two examples:
-
IPFIX / netflow export Application information can be exported by netflow or IPFIX as described in RFC 6759. IANA has allocated an application engine ID for nDPI¹. Re-using application IDs would result in flows being attributed to the wrong application. ¹ https://www.iana.org/assignments/ipfix/ipfix.xhtml#classification-engine-ids
-
DANOS DANOS² uses nDPI to identify applications. Firewall rules can be configured to allow or block the identified applications. Communication between the control plane and the dataplane requires persistent application IDs; re-using IDs would result in the wrong applications being configured. ² https://www.danosproject.org
Two proposals:
-
Parity of internal and external IDs. Each new application is assigned a new ID. Old IDs are never re-used. ndpi_protocol_id_t grows monotonically. Each application only has a single ID which is used both internally and externally. + simple to understand - less efficient for nDPI?
-
Separate internal and external IDs. Each application is allocated an external ID from a monotonically growing list. Mapping functions convert between the internal and external IDs. nDPI's internal IDs may be reused, but external IDs are never reused. + more efficient for nDPI? - extra code just for the external IDs
@lucaderi My first thought was to extend the bitmaps but I wasn't aware that this may harm nDPI performances. We can start from currently implemented protocols as feed for external_ids and we continue to change internal ones as always and external_ids grow without reuse or change. That's fair enough.
This option would be also interesting for protocols recognized using content matching automata. So we have core protocols (TLS, QUIC, STUN, etc.) that remain within internal protocols_id, can be changed and reused. For external IDs, we will not have a limit anymore, and thus, subprotocols based on content matching (ex: TLS.Reddit) only can be added to external_ids only. This implies some changes in how a match structure is declared within automata and content_match array but remains feasible.
We had the exact same issue and we solved it in the same way :-) Each protocol has an "internal-id" (used throughout the code, mutable, unknown to the user) and a "public-id" (fixed, exported to the user) You probably want to be sure that the public API only handles "public-ids". Some attention must be paid if you allow custom protocols, defined by the user via configuration (not via code recompiling the library). If you want to guarantee that also their "public-ids" are immutable, you might want to consider the following scenarios:
- the user might expect that they don't collide with any "public-ids" of future native protocols
- the user might expect that they are immutable even if he updates nDPI library version
- the user might expect that they are immutable even if he updates his own custom protocol list and rules
I would also agree on a separation between internal and external/public ids. From a dev perspective a mapping of those can be achieved with index tables where the numeric representation of an internal-id is an index to an external-id. The performance impact should be minimal. There is additional memory consumption required for the internal-id -> external-id index tables. But that won't be an issue as this additional memory consumption stays constant e.g. independent of the current flows to process. I do not have an answer on the external-id to internal-id handling yet. An index table might not work if a user uses custom protocols. Maybe we should introduce an custom-protocol-only-id-range, so we can easy check if it is part of our internal-id's or not.
I've added some example code here for discussion. What do you think?
This feature has been implemented now. You can load for instance custom protocols with high numbered IDs that can be persistent. nDPI currently implements an internal protocol id representation (that is not persistent and used only internally) but the IDs reported by nDPI are persistent and do not ned anymore to be consecutive as it used to be until this fix.
Thanks @lucaderi.