PcapPlusPlus icon indicating copy to clipboard operation
PcapPlusPlus copied to clipboard

Added code to make pointer vector be thread safe

Open muthusaravanan3186 opened this issue 1 year ago • 30 comments
trafficstars

Added code to make pointer vector be thread safe. Used mutex to lock every operation done on the pointer vector.

muthusaravanan3186 avatar Feb 26 '24 20:02 muthusaravanan3186

Kinda curious, is there a particular issue that requires the vector being thread safe?

Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

Dimi1010 avatar Feb 27 '24 00:02 Dimi1010

We had a requirement when we need to read the packet simultaneously for some kind of edge processing. Yes mutex may be overhead in non-thread environment.

muthusaravanan3186 avatar Feb 27 '24 01:02 muthusaravanan3186

Codecov Report

Attention: Patch coverage is 70.17544% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 82.35%. Comparing base (c551f40) to head (03989cc).

Files Patch % Lines
Common++/header/PointerVector.h 70.17% 15 Missing and 2 partials :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #1325   +/-   ##
=======================================
  Coverage   82.34%   82.35%           
=======================================
  Files         163      163           
  Lines       20937    20975   +38     
  Branches     7906     7910    +4     
=======================================
+ Hits        17241    17273   +32     
- Misses       3025     3030    +5     
- Partials      671      672    +1     
Flag Coverage Δ
alpine317 72.41% <12.50%> (-0.01%) :arrow_down:
fedora37 72.42% <12.50%> (+0.04%) :arrow_up:
macos-12 61.42% <67.44%> (+0.01%) :arrow_up:
macos-13 60.46% <67.44%> (+0.02%) :arrow_up:
macos-14 60.46% <67.44%> (+0.02%) :arrow_up:
macos-ventura 61.49% <72.50%> (+0.01%) :arrow_up:
mingw32 70.28% <12.50%> (-0.03%) :arrow_down:
mingw64 70.29% <12.50%> (-0.04%) :arrow_down:
npcap 83.32% <71.73%> (-0.04%) :arrow_down:
rhel93 72.44% <12.50%> (-0.02%) :arrow_down:
ubuntu1804 74.74% <30.00%> (-0.01%) :arrow_down:
ubuntu2004 73.20% <27.27%> (-0.02%) :arrow_down:
ubuntu2204 72.24% <12.50%> (-0.01%) :arrow_down:
ubuntu2204-icpx 59.04% <67.44%> (+0.06%) :arrow_up:
unittest 82.35% <70.17%> (+<0.01%) :arrow_up:
windows-2019 83.35% <69.38%> (-0.02%) :arrow_down:
windows-2022 83.34% <71.73%> (-0.04%) :arrow_down:
winpcap 83.33% <69.38%> (+0.01%) :arrow_up:
xdp 59.08% <12.50%> (-0.01%) :arrow_down:
zstd 73.82% <70.73%> (+0.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 28 '24 06:02 codecov[bot]

Kinda curious, is there a particular issue that requires the vector being thread safe?

Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

I think @Dimi1010 has a good point. Maybe we can add an optional parameter in the constructor saying whether or not to use the mutex

seladb avatar Feb 28 '24 08:02 seladb

Kinda curious, is there a particular issue that requires the vector being thread safe? Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

I think @Dimi1010 has a good point. Maybe we can add an optional parameter in the constructor saying whether or not to use the mutex

I think if we want to control using lock or not, maybe consider using ifdef USE_LOCK_POINTER_VECTOR preprocessor directive, to prevent run-time branch overhead.

tigercosmos avatar Feb 28 '24 08:02 tigercosmos

Kinda curious, is there a particular issue that requires the vector being thread safe? Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

I think @Dimi1010 has a good point. Maybe we can add an optional parameter in the constructor saying whether or not to use the mutex

I think if we want to control using lock or not, maybe consider using ifdef USE_LOCK_POINTER_VECTOR preprocessor directive, to prevent run-time branch overhead.

adding an additional if before locking the mutex shouldn't have a big performance penalty I think 🤔

seladb avatar Feb 28 '24 09:02 seladb

@seladb @tigercosmos I was thinking more along the lines of how spdlog solved it with their logging sinks. Ends up requiring a template but you don't really have any runtime overhead for the non-mutex version, and it is not a global switch of all or nothing like a preprocessor directive would be.

Example from spdlog: https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fostream_sink.h#L39-L40 https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fbase_sink-inl.h#L27

Dimi1010 avatar Feb 28 '24 09:02 Dimi1010

@seladb @tigercosmos I was thinking more along the lines of how spdlog solved it with their logging sinks. Ends up requiring a template but you don't really have any runtime overhead for the non-mutex version, and it is not a global switch of all or nothing like a preprocessor directive would be.

Example from spdlog: https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fostream_sink.h#L39-L40 https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fbase_sink-inl.h#L27

I'm not sure implementing a null_mutex has less performance penalty than adding a simple if... 🤔

seladb avatar Feb 28 '24 09:02 seladb

@seladb @tigercosmos I was thinking more along the lines of how spdlog solved it with their logging sinks. Ends up requiring a template but you don't really have any runtime overhead for the non-mutex version, and it is not a global switch of all or nothing like a preprocessor directive would be. Example from spdlog: https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fostream_sink.h#L39-L40 https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fbase_sink-inl.h#L27

I'm not sure implementing a null_mutex has less performance penalty than adding a simple if... 🤔

Might be, might not. They have null_mutex implemented like this:

struct null_mutex {
    void lock() const {}
    void unlock() const {}
};

whose lock/unlock compile down to just a return instruction back to caller,

The benefit over the if approach is that when you need a non-thread safe version, you don't need to construct and store a mutex anyway in the struct and it removes the need for ifs before every lock/unlock operation. It also allows you to actually use the lock_guard as you can't really just use it cleanly if you have an if block for wanting to lock and then the guard has to keep existing after the if block exists.

I think the potential minor performance cost of a function call over an if statement is worth it, from the point of code readability and maintainability perspective.

Dimi1010 avatar Feb 28 '24 10:02 Dimi1010

@Dimi1010 's proposal makes sense to me, especially readability and maintainability.

tigercosmos avatar Feb 28 '24 10:02 tigercosmos

are we fine with above implementation of using null mutex. If so i can go and make the necessary changes

muthusaravanan3186 avatar Feb 28 '24 18:02 muthusaravanan3186

are we fine with above implementation of using null mutex. If so i can go and make the necessary changes

Yes, I think we can go with this approach!

Thank you @Dimi1010 for jumping in and proposing it!

seladb avatar Feb 28 '24 19:02 seladb

@muthusaravanan3186 would you like to finish this PR?

tigercosmos avatar Mar 15 '24 06:03 tigercosmos

@muthusaravanan3186 would you like to finish this PR?

@tigercosmos Our requirement was single producer and single consumer. But based on review comments it seems we need to address multiple consumer too. I just want to know what is the expectation here. If we need to change the PointerVector to hold shared ptr references; then it will be big change.. Please let me know what is expected here.

muthusaravanan3186 avatar Mar 15 '24 18:03 muthusaravanan3186

@muthusaravanan3186 would you like to finish this PR?

@tigercosmos Our requirement was single producer and single consumer. But based on review comments it seems we need to address multiple consumer too. I just want to know what is the expectation here. If we need to change the PointerVector to hold shared ptr references; then it will be big change.. Please let me know what is expected here.

@muthusaravanan3186 we can do it in phases. Make some changes discussed above without using shared pointers, and then in the future open another PR to move to shared pointers. The improvements suggested in this PR will make PointerVector mostly thread safe (even if not 100%). We can mention the edge cases in the doxygen documentation

seladb avatar Mar 17 '24 22:03 seladb

@seladb can you review now

muthusaravanan3186 avatar Mar 26 '24 03:03 muthusaravanan3186

@muthusaravanan3186 could you resolve the conflicts?

tigercosmos avatar Mar 26 '24 04:03 tigercosmos

@seladb i resolved all the conflicts. But on one of the conflict i am getting the below message Only those with write access to this repository can merge pull requests.

muthusaravanan3186 avatar Mar 27 '24 04:03 muthusaravanan3186

@seladb i resolved all the conflicts. But on one of the conflict i am getting the below message Only those with write access to this repository can merge pull requests.

Not sure what did you do with the merge, but you can resolve the conflicts at your forked repo. Basically you can do the followings:

  1. set https://github.com/seladb/PcapPlusPlus/ as upstream: git remote upstream https://github.com/seladb/PcapPlusPlus/
  2. fetch upstream: git fetch upstream dev
  3. git merge upstream dev
  4. there will be some conflicts showing by git, and you need to resolve them.

tigercosmos avatar Mar 27 '24 05:03 tigercosmos

@muthusaravanan3186 you can look at this StackOverflow question: https://stackoverflow.com/questions/38949951/how-to-solve-merge-conflicts-across-forks

Please notice that you should resolve conflicts between origin/dev and muthusaravanan3186/dev (not between the master branches of origin and your fork)

seladb avatar Mar 27 '24 05:03 seladb

@seladb Resolved all the conflicts.

muthusaravanan3186 avatar Apr 04 '24 03:04 muthusaravanan3186

@tigercosmos There is no WIP PR yet; as discussed plan is to merge this PR and start to work on use of shared_ptr.

muthusaravanan3186 avatar Apr 08 '24 19:04 muthusaravanan3186

@tigercosmos There is no WIP PR yet; as discussed plan is to merge this PR and start to work on use of shared_ptr.

I am okay If you want to add the support of shared_ptr, why not add it in this PR?

template<typename T, typename Mutex=pcpp::NullMutex, typename PointerType=std::add_pointer_t<T> >
class PointerVector {

		void pushBack(PointerType element)
		{
			std::lock_guard<Mutex> lk(m_Mutex);
			m_Vector.push_back(element);
		}

		PointerType front()
		{
			std::lock_guard<Mutex> lk(m_Mutex);
			return m_Vector.front();
		}
}

tigercosmos avatar Apr 09 '24 00:04 tigercosmos

As discussed this was the plan

@muthusaravanan3186 we can do it in phases. Make some changes discussed above without using shared pointers, and then in the future open another PR to move to shared pointers. The improvements suggested in this PR will make PointerVector mostly thread safe (even if not 100%). We can mention the edge cases in the doxygen documentation

muthusaravanan3186 avatar Apr 09 '24 21:04 muthusaravanan3186

As discussed this was the plan

@muthusaravanan3186 we can do it in phases. Make some changes discussed above without using shared pointers, and then in the future open another PR to move to shared pointers. The improvements suggested in this PR will make PointerVector mostly thread safe (even if not 100%). We can mention the edge cases in the doxygen documentation

Then I oppose adding unnecessary locks for the future support of the shared pointer, because (1) those locks are useless for T* (2) using the shared pointer is not on the plan. Let's focus on handling the case that uses T*, and forget about the share pointers at this moment.

tigercosmos avatar Apr 10 '24 01:04 tigercosmos

@tigercosmos Back to square one. we have null mutex for normal case and mutex for multi-threading case. This was discussed. Since it is vector of pointers and we are accessing the pointers which itself is not thread safe. Since changing Pointervector to handle shared Pointers is a huge change; we planned to implement in phases. If we need everything at single shot it may take time. Please let me know what is the expectation. If not will drop this PR and work on something else. We already modified PCAPPLUSPLUS according to our need.

muthusaravanan3186 avatar Apr 10 '24 23:04 muthusaravanan3186

I cannot make decision for this, let's wait for @seladb

tigercosmos avatar Apr 11 '24 01:04 tigercosmos

@muthusaravanan3186 @tigercosmos @Dimi1010 I'm trying to catch up here... I think we need to decide on 2 issues, please correct me if I'm wrong:

  1. The iterator is not thread-safe because after the user receives it the vector can change and invalidate the iterator
  2. The vector holds pointers that are not thread-safe by nature, meaning one thread can hold the pointer and another thread can delete it from the vector which will free the pointer

Which issue(s) are you concerned about and is blocking this PR?

Regarding (1) - do you have any (simple) suggestion on how to solve it? Regarding (2) - we need to use a std::shared_ptr which is indeed a bigger and a breaking change

seladb avatar Apr 17 '24 07:04 seladb

@seladb @tigercosmos @muthusaravanan3186

Tbh, I don't really see an easy solution to (1) while keeping the current API of the class. The two ways I could see iteration being done both have problems:

  • You either need custom iterator objects that hold a reference to the vector mutex or some other kind of black magic, and even then mutating though the iterator would require custom calls.
  • You remove the iteration capability from the 'thread-safe' version and add a thread-safe method that makes a 'non-thread-safe' snapshot of the objects in the vector at time of call and return that to the calling thread to do as it sees fit.

Regarding point 2. Using std::unique_ptr and std::shared_ptr would remove the need for manual allocation tracking, so that would be useful, and std::shared_ptr would work in removing the problem of dangling pointers after removal from the vector.

With all that said, I am personally of the opinion that the burden of thread safety should not be placed on the vector object itself, but on the program that requires the vector data to be accessed by multiple threads simultaneously. My reasoning is that the vector and raw arrays in general provide way too many ways to access or mutate the data to be easily secured, while the program attempting to access the data from different threads would have a far better knowledge of how the data is being accessed to provide thread-safe data access and avoiding race conditions.

A simpler solution would be having the different threads make a local copy of the vector contents while under synchronization and then proceeding independently using their local copies. This type of solution minimizes the points of contention where sync lock needs to be acquired between the threads, as the alternative would require a sync lock on every element access or mutation. It also removes the need of the underlying objects that are being held by the array being thread-safe too.

Dimi1010 avatar Apr 17 '24 11:04 Dimi1010

After reading @Dimi1010 's response I started wondering what is the use case of a thread-safe pointer vector if both (1) and (2) are not solved...

In general, a thread-safe container is only needed when one or more threads are changing the vector, usually while other threads are reading data from it:

  • If (1) is not solved then the threads can't use an iterator (in a for loop for example), so the only way to access the vector data is by using at() or front(). While that's possible, it's not very convenient
  • If (2) is not solved then no thread can remove elements from the vector, otherwise other threads may hold invalid pointers. That's a pretty major limitation that narrows down many of the use-cases for using the vector in a multithread environment

@muthusaravanan3186 can you share your use-case for thread-safe pointer vector? It might help us decide if we actually want to implement it or not

seladb avatar Apr 18 '24 06:04 seladb