Memory leak due to repeated esp_timer allocations in Callback_Watchdog when used by request-related callbacks v0.16.0
Summary:
There is a memory leak issue in Callback_Watchdog when used as a member in several request-related callback classes (OTA_Handler, Provision_Callback, Attribute_Request_Callback, and RPC_Request_Callback). The leak is caused by the repeated allocation of esp_timer instances on each call to once(...), without corresponding deallocation, especially when objects are created in local scopes and copied.
Detailed Explanation:
In version 0.15.0 of the ThingsBoard SDK, the Callback_Watchdog class creates a new ESP-IDF one-shot timer every time the once(const uint64_t& timeout_microseconds) method is called if no timer is already present.
Each of these timer creations allocates ~60 bytes from the heap, but the created timers are never freed, causing memory leaks when requests are made frequently.
This issue becomes especially problematic when using the callback classes in the following way:
const Attribute_Request_Callback<MAX_SHARED_ATTRIBUTES_AMOUNT_PER_ONCE> logLevelRequestSharedAttrCB(
&processSharedAttributeCB,
SHARED_ATTR_REQUEST_TIMEOUT_MICROSECONDS,
&logLevelRequestTimeout,
REQUESTED_SHARED_ATTRIBUTES
);
In this example, the callback is constructed as a const object in a local scope and passed by reference to Attributes_Request_Subscribe, which stores it inside an array structure by copy.
Since Callback_Watchdog is a member of the callback class and its copy constructor is defaulted, a shallow copy is made — the internal esp_timer* pointer is copied directly.
When the local object goes out of scope, its destructor runs, but the copied object still holds a reference to the same esp_timer*. This leads to repeated allocations for new timers and no corresponding destruction — resulting in a heap leak for every request.
Fix Applied in Fork:
In our fork (BatuhanKaratas/thingsboard-client-sdk), we fixed this issue with the following changes:
Added create_timer(); inside Callback_Watchdog's constructor to ensure a single timer instance is created and reused.
Used Scott Meyers' singleton pattern to avoid frequent construction/destruction of the timer object, ensuring safe reuse and avoiding dangling pointers.
As a result, usage like the following is now safe:
static const Attribute_Request_Callback<MAX_SHARED_ATTRIBUTES_AMOUNT_PER_ONCE> logLevelRequestSharedAttrCB(
&processSharedAttributeCB,
SHARED_ATTR_REQUEST_TIMEOUT_MICROSECONDS,
&logLevelRequestTimeout,
REQUESTED_SHARED_ATTRIBUTES
);
We applied similar static object changes to all other classes that hold a Callback_Watchdog member.
Environment:
ThingsBoard SDK version: 0.15.0
ESP-IDF version: v5.5.0
Platform: ESP32-S3
Tool: VS Code
Suggested Fix:
We recommend:
Refactoring Callback_Watchdog to ensure safe timer creation and reuse
Preventing shallow copies of watchdogs that lead to duplicated allocations
Documenting safe usage patterns (e.g., static callback objects)
Let us know if you'd like us to contribute a pull request with this fix.
I think this might just be a simple misunderstanding of the reason why the esp_timer is created only when the once() method is actually called and then kept alive until the destruction of the object.
A more detailed description can be found in the comment of the actual create_timer method.
This method explains why creating the timer in the constructor is not a valid choice, because if it is created in global scope it will be executed before any ESP intialization could run, therefore calling the esp method does not work as expected and produces and invalid timer handle which will later cause crashes.
The rest of the important information is written in the destructor. The idea of the implementation using the esp_timer is to reuse the actual timer object and only allocate it once the Callback_Watchdog has been used for the first time. Further calls to once() or detach() will then simply stop and restart the timer. This can be seen from the implementation, which checks if the handle is still a nullptr and if it isn't it won't allocate a new one.
This allows for minimal allocations, because the same underlying object can be reused and will only be deleted once the Callback_Watchdog itself is deleted.
But you are correct there is a memory leak and I have found and debugged the actual reason for it, it is displayed in this small test program created with an online compiler.
The problem is that the actual Array Implementation used by ThingsBoard does not call the destructor of elements that it deletes instead it simply overwrites the objects with other objects. This then causes memory leaks if the object contains heap memory.
A fixed version of the example is here. It simply includes an implicit call to the destructor of the previous object (if it has one) before overriding the current object.
The fix is in my branch as well but I have put the relevant portions below so you can try it in your codebase if you want too:
void push_back(const_reference element) {
#if THINGSBOARD_ENABLE_DYNAMIC
increase_capacity();
#else
assert(m_size < Capacity);
#endif // THINGSBOARD_ENABLE_DYNAMIC
if constexpr (std::is_destructible<T>::value) {
m_elements[m_size].~T();
}
m_elements[m_size] = element;
(void)m_size++;
}
template<typename InputIterator>
void insert(iterator position, InputIterator const & first, InputIterator const & last) {
(void)--position;
assert_iterator_in_range(position);
#if !THINGSBOARD_ENABLE_DYNAMIC
assert((m_size + Helper::distance(first, last)) < Capacity);
#endif // !THINGSBOARD_ENABLE_DYNAMIC
for (auto it = first; it != last; ++it, ++position) {
#if THINGSBOARD_ENABLE_DYNAMIC
increase_capacity();
#endif // THINGSBOARD_ENABLE_DYNAMIC
if constexpr (std::is_destructible<T>::value) {
(*it).~T();
}
*position = *it;
(void)++m_size;
}
}
void erase(const_iterator position) {
assert_iterator_in_range(position);
size_type const index = Helper::distance(cbegin(), position);
// Move all elements after the index one position to the left
for (size_type i = index; i <= m_size; ++i) {
if constexpr (std::is_destructible<T>::value) {
m_elements[i].~T();
}
m_elements[i] = m_elements[i + 1];
}
if constexpr (std::is_destructible<T>::value) {
m_elements[m_size].~T();
}
// Decrease the size of the array to remove the last element, because either it was moved one index to the left or was the element we wanted to delete anyway
(void)--m_size;
}
Furthermore the Callback_Watchdog can theortically cause a double free or other issues. If the start timer method is called by the user, before passing it to the method that method then shallow copies it. If the local variable then goes out of scope the actual pointer will be deleted, causing crashes.
I fixed that in my fork as well, a custom class that has a custom destructor should always have a custom copy constructor or alternatively be not copyable at all. Everything else is bad practice.
Thank you for the detailed explanation and for identifying the root cause of the memory leak.
Your analysis regarding the shallow copy issue and the absence of destructor calls during container operations makes complete sense.
I attempted to apply the same modifications to both the Callback_Watchdog and container classes in my own fork, but encountered some issues during runtime. After switching to your fork, I also experienced the behavior described in issue #262, including failures in RPC and attribute subscriptions.
As an additional suggestion, it may be worth considering the use of std::unique_ptr for managing dynamically allocated internal resources such as timers. This could simplify ownership management, eliminate the need for manual destructor calls, and help avoid potential double free or use-after-free scenarios caused by unintended copies. It would also align with modern C++ best practices and contribute to safer and more maintainable code.
Thanks again for your efforts and continued improvements to the library.