Separate `RawPacket` and `MBufRawPacket`
Preface
Currently there are 2 main "raw" packet container types: RawPacket and MBufRawPacket. Both are specialized in different use cases:
- RawPacket: Specialized to hold a buffer on the free store or with external storage duration (user must ensure lifetime).
- MBufRawPacket: Specialized to utilize DPDK's MBuf infrastructure and read/write data to it.
The problem - RawPacket is currently works as both "concrete class" and "base class"
In the majority of the library, a pointer to RawPacket is used where a packet data container is needed. This leads to the current implementation of MBufRawPacket subclassing from RawPacket to be interchangeable.
The primary issue with this design choice is that RawPacket now has double duty of both being a final implementation class and an abstract base class. The need to both have it's own functionality and the need to accommodate subclassing leads to maintainability issues and hard to read code, especially in the copy operations.
Proposed solution
A potential solution would be to extract a base class (or interface) over RawPacket and MBufRawPacket that defines the common interface or functionality between the packets. This disentangles the need for both classes to work around each other, simplifying the code.
I remember I had these thoughts when I created MBufRawPacket, but I'm not sure creating an interface or a base class is really needed. MBufRawPacket overrides the necessary functions of RawPacket and is more of a niche use case needed for DPDK.
This leads to the current implementation of
MBufRawPacketsubclassing fromRawPacketto be interchangeable.
Where do you see it as a problem or a blocker to a feature we want to implement but can't?
The primary issues I have in mind would be from the internal capacity tracking and buffer policies described in #1888. It can possibly be worked around that but it would be messy and probably less maintainable in the long run as every change in RawPacket must be checked so it does not break something in MBufRawPacket.
Since we only have 2 classes RawPacket and MBufRawPacket and most of the code uses RawPacket, I don't think it'd be very hard to maintain both. If we add more types of raw packets in the future it may be more difficult to maintain, but I don't currently see a feature that would need that. If we change RawPacket's behavior we have to accomodate these changes in both classes anyway in order for everything to work correctly
If we change RawPacket's behavior we have to accomodate these changes in both classes anyway in order for everything to work correctly
Tbh, that is also one of the issues. That the external behaviour needs to match is a given. But currently internal implementation also needs to match, due to tight coupling, which limits the ways the internals can be changed and increases fragility.
It can also help with assumptions about the classes and their capabilities. For example, in the draft I have for buffer policies:
-
MBufRawPacketwould only support policiesCopyandMove, (maybeSoftReference, but that will be promoted toCopy) due to the fact that it needs the packet data in MBuf memory, not in free store memory. -
RawPacketon the other hand, can support all policies as it is specialized to operate on free store memory.
Having them separate ensures that if you have a RawPacket pointer, it can always operate on all policies. The checks only need to be made if you operate on a base pointer.
Separation also reduces the logical reasoning that needs to be made for each class, leading to easier code maintenance.
- Base class
RawPacketBaseoperates on abstract data buffer - irrespective of storage. -
RawPacketalways operates on free store or automatic storage buffers. -
MBufPacketalways operates onMBufmemory.
IMO, mashing the requirements of 1. and 2. in a single class and then having a 3. override 2.'s functions is much harder to reason about than having them distinct.
Another issue is about the internal capacity tracking. Generally I think this will be implemented via size_t m_Capacity member in RawPacket. If MBufRawPacket derives from RawPacket that capacity would have to be exposed to derived classes, breaking encapsulation (either via setter or directly) and making it so the base class can't rely on it being invariant in RawPacket operations that call overriden methods, without also checking derived implementations. This is also incidentally why the core guidelines recommend protected data be avoided. [1]
I don't disagree with your claims, I just think it'd be a lot of work that is not really needed and might create additional bugs in a sensitive part of the library. If we had a plan to make 2-3 more types of RawPacket I'd say we have no choice and have to go through this refactoring, but since I don't see that happening any time soon, I think it'd be redundant.
We should consider when refactoring is needed for concrete extensibility requirements or to make the code simpler/shorter, vs. do a refactoring without any concrete need...
Sure, I guess we can table this suggestion until there is a need.
I can work on the other suggestion points and try to make them work without the base class at first?
@seladb Btw, something to note, the way the API is written it allows accidental object slicing due to RawPacket having public copy assignment ctor (which it kinda needs to have for easy API on RawPacket itself).
pcpp::MBufRawPacket* mBufPacket1; // Assume allocated from somewhere.
pcpp::MBufRawPacket* randomPacketInstance; // Can be RawPacket, it doesn't matter.
pcpp::RawPacket* mBufAsRawPacket1 = mBufPacket1; // Usage through base pointer.
// RawPacket& operator=(const RawPacket& other) is used here, which uses copyDataFrom(other, true).
*mBufAsRawPacket1 = *randomPacketInstance;
@seladb Reopening this as blocking of #1910.
Maintaining capacity invariants becomes a massive headache, especially with the value copy semantics of RawPacket and having to track 2 separate interleaved implementations while making the change.