nDPI icon indicating copy to clipboard operation
nDPI copied to clipboard

Persistent Application Protocol IDs

Open lucaderi opened this issue 4 years ago • 6 comments

Some people complain that numeric nDPi protocol IDs are not persistent. Protocols are removed from nDPI when they are obsolete (i.e. not present in modern traffic) as they keep space that will be unused. This is due to the fact that keeping short the list of protcools makes nDPI more efficient and bitmaps are shorter. Some people complain about this, as when protocol X is removed, X can be used by a totally new different protocol.

As some people do not like this, it is a good time to introduce persistent IDs that won't change and won't be recycled. This way every protocol will have a persistent Id that can be used if necessary instead of the current application protocol that in essence will become just an internal id.

lucaderi avatar Nov 26 '20 18:11 lucaderi

@lnslbrty @IvanNardi @aouinizied What do you think?

lucaderi avatar Nov 26 '20 18:11 lucaderi

Internally to nDPI, it might not matter much that application IDs are reused. However nDPI nolonger stands alone: in the SDN world it is now used as part of larger systems. It's crucial that external systems are able to disambiguate applications - so application IDs must be permanent, long-lived, and cannot be re-used.

Two examples:

  1. IPFIX / netflow export Application information can be exported by netflow or IPFIX as described in RFC 6759. IANA has allocated an application engine ID for nDPI¹. Re-using application IDs would result in flows being attributed to the wrong application. ¹ https://www.iana.org/assignments/ipfix/ipfix.xhtml#classification-engine-ids

  2. DANOS DANOS² uses nDPI to identify applications. Firewall rules can be configured to allow or block the identified applications. Communication between the control plane and the dataplane requires persistent application IDs; re-using IDs would result in the wrong applications being configured. ² https://www.danosproject.org

Two proposals:

  1. Parity of internal and external IDs. Each new application is assigned a new ID. Old IDs are never re-used. ndpi_protocol_id_t grows monotonically. Each application only has a single ID which is used both internally and externally. + simple to understand - less efficient for nDPI?

  2. Separate internal and external IDs. Each application is allocated an external ID from a monotonically growing list. Mapping functions convert between the internal and external IDs. nDPI's internal IDs may be reused, but external IDs are never reused. + more efficient for nDPI? - extra code just for the external IDs

pjaitken avatar Nov 26 '20 20:11 pjaitken

@lucaderi My first thought was to extend the bitmaps but I wasn't aware that this may harm nDPI performances. We can start from currently implemented protocols as feed for external_ids and we continue to change internal ones as always and external_ids grow without reuse or change. That's fair enough.

This option would be also interesting for protocols recognized using content matching automata. So we have core protocols (TLS, QUIC, STUN, etc.) that remain within internal protocols_id, can be changed and reused. For external IDs, we will not have a limit anymore, and thus, subprotocols based on content matching (ex: TLS.Reddit) only can be added to external_ids only. This implies some changes in how a match structure is declared within automata and content_match array but remains feasible.

aouinizied avatar Nov 27 '20 01:11 aouinizied

We had the exact same issue and we solved it in the same way :-) Each protocol has an "internal-id" (used throughout the code, mutable, unknown to the user) and a "public-id" (fixed, exported to the user) You probably want to be sure that the public API only handles "public-ids". Some attention must be paid if you allow custom protocols, defined by the user via configuration (not via code recompiling the library). If you want to guarantee that also their "public-ids" are immutable, you might want to consider the following scenarios:

  • the user might expect that they don't collide with any "public-ids" of future native protocols
  • the user might expect that they are immutable even if he updates nDPI library version
  • the user might expect that they are immutable even if he updates his own custom protocol list and rules

IvanNardi avatar Nov 28 '20 15:11 IvanNardi

I would also agree on a separation between internal and external/public ids. From a dev perspective a mapping of those can be achieved with index tables where the numeric representation of an internal-id is an index to an external-id. The performance impact should be minimal. There is additional memory consumption required for the internal-id -> external-id index tables. But that won't be an issue as this additional memory consumption stays constant e.g. independent of the current flows to process. I do not have an answer on the external-id to internal-id handling yet. An index table might not work if a user uses custom protocols. Maybe we should introduce an custom-protocol-only-id-range, so we can easy check if it is part of our internal-id's or not.

utoni avatar Nov 30 '20 11:11 utoni

I've added some example code here for discussion. What do you think?

pjaitken avatar Dec 15 '20 22:12 pjaitken

This feature has been implemented now. You can load for instance custom protocols with high numbered IDs that can be persistent. nDPI currently implements an internal protocol id representation (that is not persistent and used only internally) but the IDs reported by nDPI are persistent and do not ned anymore to be consecutive as it used to be until this fix.

lucaderi avatar Jun 13 '23 15:06 lucaderi

Thanks @lucaderi.

pjaitken avatar Jun 29 '23 11:06 pjaitken