DynamicCodingKey - Poor Performance
Describe the bug
Performance Issues with DynamicCodingKey Causing High Decoding Overhead with Dedicated Thread Resource Hogging
Description:
We're encountering significant performance issues with the DynamicCodingKey implementation in the Datadog iOS SDK, specifically in the DatadogInternal module.
Issue Details:
- The
DynamicCodingKeystruct (source link) is performing extremely poorly. - We are seeing spikes where
DecodingError.typeMismatch(_:in:)(source link) is being triggered heavily whenever messages are sent over our message bus for updates. - The decoding process appears to hold onto a dedicated thread, and profiling in Xcode's Instruments Time Profiler and Dynatrace shows that upwards of 50% of our weighted profiling allocations are attributed to this decoding work, which is highly disproportionate.
Possible Cause:
- Many of our messages are simple strings, and the way the
switchcase is structured results in excessive iteration through the type list each time a message is decoded. - Instead of using
Anyand re-casting, there should be optimizations around persisting the type information to avoid redundant type-checking and excessive allocation overhead.
Expected Behavior:
- The decoding process should be significantly more optimized, reducing the performance overhead associated with
DynamicCodingKey. - Decoding should not consume excessive resources on a dedicated thread, preventing performance degradation in high-throughput scenarios.
Impact:
- The current implementation is causing major dedicated thread allocations and overhead just to decoding commonly set metadata fields when CrashReporting is enabled (only).
Suggested Fix:
- Optimize the
DynamicCodingKeystruct to avoid unnecessary iterations and type re-casting. - Reduce reliance on
Anyand leverage type persistence to enhance decoding efficiency.
Environment:
- Datadog iOS SDK Version: 2.24.0 (SPM & Cocoapods)
- iOS Version: iOS 18.1 (and others)
- Device Model: iPhone 16 Plus (and others)
We would appreciate any guidance on how this could be improved or whether there are any planned optimizations for this area. Thanks!
Reproduction steps
Steps to Reproduce:
- Setup a dummy project with RUM + Crash reporting.
- Enable a mechanism to update attributes.
- Monitor performance with Xcode's Instruments Time Profiler and Dynatrace.
- Observe that
DecodingError.typeMismatch(_:in:)is triggered excessively, and thatDynamicCodingKeyis consuming an unexpectedly high amount of threading resources.
import SwiftUI
import DatadogCore
import DatadogInternal
import DatadogRUM
@Observable
class ContentViewModel {
func updateMetadata() {
Datadog.setUserInfo(
id: "12345",
extraInfo: [
"token": UUID().uuidString,
"token2": UUID().uuidString,
"token3": UUID().uuidString,
]
)
RUMMonitor
.shared()
.addAttribute(
forKey: "thing",
value: UUID().uuidString
)
}
}
struct ContentView: View {
@State var viewModel = ContentViewModel()
var body: some View {
VStack {
Image(systemName: "globe")
.imageScale(.large)
.foregroundStyle(.tint)
Text("Hello, world!")
Button("Update Meta Data") {
viewModel.updateMetadata()
}
}
.padding()
}
}
import SwiftUI
import DatadogCore
import DatadogCrashReporting
import DatadogRUM
class AppDelegate: UIResponder, UIApplicationDelegate {
func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey : Any]? = nil) -> Bool {
Datadog.initialize(
with: .init(
clientToken: "<insert>",
env: "development",
site: .us5
),
trackingConsent: .granted
)
let appID = "<insert>"
let configuration = RUM.Configuration(
applicationID: appID,
sessionSampleRate: 50
)
RUM.enable(
with: RUM.Configuration(
applicationID: appID,
uiKitViewsPredicate: DefaultUIKitRUMViewsPredicate(),
uiKitActionsPredicate: DefaultUIKitRUMActionsPredicate(),
urlSessionTracking: RUM.Configuration.URLSessionTracking()
)
)
CrashReporting.enable()
return true
}
}
@main
struct DataDogAppApp: App {
@UIApplicationDelegateAdaptor var appDelegate: AppDelegate
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
SDK logs
No response
Expected behavior
DynamicCodingKey is far less resource hungry threading wise.
Affected SDK versions
2.24.0
Latest working SDK version
Unsure
Did you confirm if the latest SDK version fixes the bug?
Yes
Integration Methods
SPM
Xcode Version
16.1
Swift Version
5.9 + 6
MacOS Version
15.3.1
Deployment Target
iPhone + iPad
Device Information
No response
Other relevant information
No response
Hello @blevasseur-block π. Thanks for opening this issue and sharing so many great insights β really appreciated!
I was able to reproduce the problem on our end as well, so weβre aligned there π.
On the Suggested Fix:
- Optimize the
DynamicCodingKeystruct to avoid unnecessary iterations and type re-casting.- Reduce reliance on
Anyand leverage type persistence to enhance decoding efficiency.
This is exactly the approach we aligned on internally some time ago. Your feedback reinforces that direction, and weβre bumping up the priority as a result π. Mid-term plan is to get rid of AnyCodable use in CrashContext and long-term direction is to drop baggage() type-erasure from cross-product communication (like RUM <> Crash Reporting) and replace it with strongly typed values.
In fact, we already started on this last year. The plan closely follows your suggestion. To quote myself:
Moving also the DDCrashReport which unlocks the performance optimisation that we have planned for
RUM-2971as a follow up action from SDK latency investigation. By sharing the model definition between features, there is no longer a need for coding it through message-bus "baggage".
Thanks again β your input is helping shape our roadmap!
Hi there, iOS crash reporting via Datadog is something of a superpower relied on by my team for operational metrics, and digging into session behavior.
What's the latest here? Thanks for digging in
Hey @kwigginton π. Weβre actively working on performance improvements related to this issue. The most critical part reported here has already been addressed, and the fix will be included in the next release. Additional performance optimizations will be rolled out gradually in subsequent releases.
The core fix, which resolves the slow crash context decoding, has already been merged in #2276. In parallel, we're also optimizing cross-product communication throughout the SDK, with ongoing efforts tracked in #2294, #2293, #2290, and others.
Hey @blevasseur-block , @kwigginton π
We have released 2.28.0 which includes the perf improvement related to Feature-communication in the SDK. You should no longer experience overhead due to type-erasure nor decoding error, we have removed these complications.
Thanks again for the detailed report and insights!