[DAPS-1395] Address Memory Leaks in Core Server
PR Description
This PR addresses several memory bugs discovered in the common library and the core server. Though the source code coverage is not exhaustive, it should significantly reduce the need to restart the core server. The main source of error and of bugs is related first to the absence of virtual destructors in derived classes that contain unique_ptrs. By not including them, the base destructor was being called which was failing to cleanup the memory. In addition, cleanup of the ZeroMQ context was addressed. A special note, the ZeroMQ context should be the same when using INPROC and otherwise it is better if it is different for TCP connections to avoid thread contention.
Tasks
- [x] - A description of the PR has been provided, and a diagram included if it is a new feature.
- [x] - Formatter has been run
- [ ] - CHANGELOG comment has been added
- [x] - Labels have been assigned to the pr
- [x] - A reviwer has been added
- [x] - A user has been assigned to work on the pr
- [ ] - If new feature a unit test has been added
Summary by Sourcery
Address memory leaks in the core server and common library by enforcing virtual destructors and scoped ZeroMQ context management, while enhancing the test suite with Protobuf teardown and enabling valgrind-based memory tests via new CMake options.
New Features:
- Add ENABLE_MEMORY_TESTS option and integrate valgrind-based memory tests in CMake
Bug Fixes:
- Fix memory leaks by adding virtual destructors to key interfaces and classes
- Ensure proper cleanup of ZeroMQ contexts to eliminate context leaks
Enhancements:
- Introduce InprocContext for shared INPROC ZeroMQ context management with reference counting
- Update ZeroMQCommunicator (and secure variant) and ProxyBasicZMQ to use InprocContext for context lifecycle
Build:
- Add CMake flags and conditional memory test targets for valgrind leak detection
Tests:
- Add global BOOST_GLOBAL_FIXTURE for ProtobufLibrary shutdown in multiple unit tests
Reviewer's Guide
This PR refactors ZeroMQ context management by introducing an InprocContext for scheme-based creation and cleanup, adds missing virtual destructors across core interfaces and classes to ensure proper resource release, integrates global Protobuf shutdown fixtures in unit tests to prevent library leaks, updates CMake test configurations to support optional valgrind-based memory tests, and enhances error handling during ZeroMQ socket initialization.
Sequence Diagram: ZeroMQ INPROC Context Lifecycle
sequenceDiagram
title "Sequence Diagram: ZeroMQ INPROC Context Lifecycle"
participant Comm as ZeroMQCommunicator
participant InprocCtx as InprocContext
Comm->>InprocCtx: getContext()
activate InprocCtx
Note over InprocCtx: Returns shared context (creates if null)
InprocCtx-->>Comm: context_ptr
deactivate InprocCtx
Comm->>InprocCtx: increment()
activate InprocCtx
Note over InprocCtx: Increments usage counter
InprocCtx-->>Comm:
deactivate InprocCtx
Note over Comm: Communicator operates...
Comm->>InprocCtx: decrement()
activate InprocCtx
Note over InprocCtx: Decrements usage counter
InprocCtx-->>Comm:
deactivate InprocCtx
Comm->>InprocCtx: get() (to check counter)
activate InprocCtx
InprocCtx-->>Comm: current_counter_value
deactivate InprocCtx
alt if counter == 0
Comm->>InprocCtx: resetContext()
activate InprocCtx
Note over InprocCtx: Terminates and nullifies shared context
InprocCtx-->>Comm: status_code
deactivate InprocCtx
end
Sequence Diagram: ZeroMQ Non-INPROC Context Lifecycle
sequenceDiagram
title "Sequence Diagram: ZeroMQ Non-INPROC Context Lifecycle"
participant Comm as ZeroMQCommunicator
participant ZMQ_Lib as ZeroMQ Library
Note over Comm: Constructor for non-INPROC socket
Comm->>ZMQ_Lib: zmq_ctx_new()
activate ZMQ_Lib
ZMQ_Lib-->>Comm: new_context_ptr
deactivate ZMQ_Lib
Note over Comm: Communicator operates with its own context...
Note over Comm: Destructor
Comm->>ZMQ_Lib: zmq_ctx_term(context_ptr)
activate ZMQ_Lib
ZMQ_Lib-->>Comm: status_code
deactivate ZMQ_Lib
File-Level Changes
| Change | Details | Files |
|---|---|---|
| Scheme-based ZeroMQ context lifecycle management |
|
common/source/support/zeromq/Context.hppcommon/source/communicators/ZeroMQCommunicator.cppcommon/source/communicators/ZeroMQCommunicatorSecure.cppcommon/source/servers/ProxyBasicZMQ.cpp |
| Add missing virtual destructors to polymorphic classes |
|
common/include/common/ISocket.hppcommon/source/sockets/ZeroMQSocket.hppcommon/include/common/ICredentials.hppcommon/include/common/IMessage.hppcommon/include/common/IOperator.hppcommon/include/common/IServer.hppcommon/source/credentials/ZeroMQSocketCredentials.hppcommon/source/messages/GoogleProtoMessage.hppcommon/source/operators/AuthenticationOperator.hppcommon/source/operators/RouterBookKeepingOperator.hppcommon/source/communicators/ZeroMQCommunicatorSecure.hppcommon/source/servers/Proxy.hppcommon/source/servers/ProxyBasicZMQ.hpp |
| Global Protobuf shutdown fixture in unit tests |
|
common/tests/unit/test_Buffer.cppcommon/tests/unit/test_CommunicatorFactory.cppcommon/tests/unit/test_Frame.cppcommon/tests/unit/test_MessageFactory.cppcommon/tests/unit/test_OperatorFactory.cppcommon/tests/unit/test_ProtoBufFactory.cppcommon/tests/unit/test_ProtoBufMap.cppcommon/tests/unit/test_ProxyBasicZMQ.cppcommon/tests/unit/test_Proxy.cppcore/server/tests/unit/test_AuthMap.cppcore/server/tests/unit/test_AuthenticationManager.cpprepository/gridftp/globus5/authz/tests/unit/test_AuthzWorker.cpprepository/gridftp/globus5/authz/tests/unit/test_getVersion.cpp |
| Conditional memory testing in CMake |
|
common/tests/unit/CMakeLists.txtcore/server/tests/unit/CMakeLists.txtcommon/tests/CMakeLists.txtcore/server/tests/CMakeLists.txtCMakeLists.txt |
| Enhanced error handling for ZeroMQ socket creation |
|
common/source/communicators/ZeroMQCommunicatorSecure.cpp |
Possibly linked issues
- #1395: PR addresses memory leaks by adding virtual destructors and ZeroMQ context cleanup.
Tips and commands
Interacting with Sourcery
-
Trigger a new review: Comment
@sourcery-ai reviewon the pull request. - Continue discussions: Reply directly to Sourcery's review comments.
-
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with
@sourcery-ai issueto create an issue from it. -
Generate a pull request title: Write
@sourcery-aianywhere in the pull request title to generate a title at any time. You can also comment@sourcery-ai titleon the pull request to (re-)generate the title at any time. -
Generate a pull request summary: Write
@sourcery-ai summaryanywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment@sourcery-ai summaryon the pull request to (re-)generate the summary at any time. -
Generate reviewer's guide: Comment
@sourcery-ai guideon the pull request to (re-)generate the reviewer's guide at any time. -
Resolve all Sourcery comments: Comment
@sourcery-ai resolveon the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore. -
Dismiss all Sourcery reviews: Comment
@sourcery-ai dismisson the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment@sourcery-ai reviewto trigger a new review!
Customizing Your Experience
Access your dashboard to:
- Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
- Change the review language.
- Add, remove or edit custom review instructions.
- Adjust other review settings.
Getting Help
- Contact our support team for questions or feedback.
- Visit our documentation for detailed guides and information.
- Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.
@par-hermes format
Observing test failure when same test is run in succession. First time passes every time after that it fails.
LD_LIBRARY_PATH=/opt/datafed/dependencies/lib /opt/datafed/dependencies/bin/ctest -R mock
Test project /home/brownjs/software/open/datafed/DataFed/build-memory-2
Start 33: start_mock
1/3 Test #33: start_mock ....................... Passed 2.01 sec
Start 38: mock_liveness_test_getVersion
2/3 Test #38: mock_liveness_test_getVersion .... Passed 0.22 sec
Start 34: stop_mock
3/3 Test #34: stop_mock ........................ Passed 0.00 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 2.24 sec
brownjs@LAP142548:~/software/open/datafed/DataFed/build-memory-2$ LD_LIBRARY_PATH=/opt/datafed/dependencies/lib /opt/datafed/dependencies/bin/ctest -R mock
Test project /home/brownjs/software/open/datafed/DataFed/build-memory-2
Start 33: start_mock
1/3 Test #33: start_mock ....................... Passed 2.01 sec
Start 38: mock_liveness_test_getVersion
2/3 Test #38: mock_liveness_test_getVersion ....***Failed 2.02 sec
Start 34: stop_mock
3/3 Test #34: stop_mock ........................ Passed 0.00 sec
Looks like a timing issue. Running with verbose leads to the tests passing, but running without leads to failure.
/opt/datafed/dependencies/bin/ctest --verbose -R mock # passes
/opt/datafed/dependencies/bin/ctest -R mock # fails
Actually, now I'm seeing it pass even without verbose. This is making testing of this application brittle. Out of 10 test runs 4 are failing. In another run 6/20 are failing with a timeout.
Adding send and receive retries with the same communicators (zeromq sockets) on test don't help seeing 7/20 failures.
Recreating sockets and zeromq sockets on retries seems to improve things seeing 2/20 failure, and 3/20 failure.
Rerunning tests with pause between bringing down server leads to 5/20 failures.
Removing pauses between server start and client send does not change anything 5/20, 3/20/ and 6/20 failures.
Increasing pauses after creation and after shutdown to 5 seconds does not help same ratio of failures.
ZeroMQ errors and message drops were addressed in a separate issue and PR https://github.com/ORNL/DataFed/issues/1437.
They need to be merged in before the CI will pass. ZeroMQ issues have lead to instability in the integration tests.
@par-hermes format