dd-sdk-ios icon indicating copy to clipboard operation
dd-sdk-ios copied to clipboard

DynamicCodingKey - Poor Performance

Open blevasseur-block opened this issue 9 months ago β€’ 1 comments

Describe the bug

Performance Issues with DynamicCodingKey Causing High Decoding Overhead with Dedicated Thread Resource Hogging

Description: We're encountering significant performance issues with the DynamicCodingKey implementation in the Datadog iOS SDK, specifically in the DatadogInternal module.

Issue Details:

  • The DynamicCodingKey struct (source link) is performing extremely poorly.
  • We are seeing spikes where DecodingError.typeMismatch(_:in:) (source link) is being triggered heavily whenever messages are sent over our message bus for updates.
  • The decoding process appears to hold onto a dedicated thread, and profiling in Xcode's Instruments Time Profiler and Dynatrace shows that upwards of 50% of our weighted profiling allocations are attributed to this decoding work, which is highly disproportionate.

Possible Cause:

  • Many of our messages are simple strings, and the way the switch case is structured results in excessive iteration through the type list each time a message is decoded.
  • Instead of using Any and re-casting, there should be optimizations around persisting the type information to avoid redundant type-checking and excessive allocation overhead.

Expected Behavior:

  • The decoding process should be significantly more optimized, reducing the performance overhead associated with DynamicCodingKey.
  • Decoding should not consume excessive resources on a dedicated thread, preventing performance degradation in high-throughput scenarios.

Impact:

  • The current implementation is causing major dedicated thread allocations and overhead just to decoding commonly set metadata fields when CrashReporting is enabled (only).

Suggested Fix:

  • Optimize the DynamicCodingKey struct to avoid unnecessary iterations and type re-casting.
  • Reduce reliance on Any and leverage type persistence to enhance decoding efficiency.

Environment:

  • Datadog iOS SDK Version: 2.24.0 (SPM & Cocoapods)
  • iOS Version: iOS 18.1 (and others)
  • Device Model: iPhone 16 Plus (and others)

We would appreciate any guidance on how this could be improved or whether there are any planned optimizations for this area. Thanks!

Image

Reproduction steps

Steps to Reproduce:

  1. Setup a dummy project with RUM + Crash reporting.
  2. Enable a mechanism to update attributes.
  3. Monitor performance with Xcode's Instruments Time Profiler and Dynatrace.
  4. Observe that DecodingError.typeMismatch(_:in:) is triggered excessively, and that DynamicCodingKey is consuming an unexpectedly high amount of threading resources.
import SwiftUI
import DatadogCore
import DatadogInternal
import DatadogRUM

@Observable
class ContentViewModel {
    func updateMetadata() {
        Datadog.setUserInfo(
            id: "12345",
            extraInfo: [
                "token": UUID().uuidString,
                "token2": UUID().uuidString,
                "token3": UUID().uuidString,
            ]
        )
        
        RUMMonitor
            .shared()
            .addAttribute(
                forKey: "thing",
                value: UUID().uuidString
            )
        
    }
}

struct ContentView: View {
    @State var viewModel = ContentViewModel()
    
    var body: some View {
        VStack {
            Image(systemName: "globe")
                .imageScale(.large)
                .foregroundStyle(.tint)
            Text("Hello, world!")
            Button("Update Meta Data") {
                viewModel.updateMetadata()
            }
        }
        .padding()
    }
}
import SwiftUI
import DatadogCore
import DatadogCrashReporting
import DatadogRUM

class AppDelegate: UIResponder, UIApplicationDelegate {
    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey : Any]? = nil) -> Bool {
        Datadog.initialize(
            with: .init(
                clientToken: "<insert>",
                env: "development",
                site: .us5
            ),
            trackingConsent: .granted
        )
        
        let appID = "<insert>"
        
        let configuration = RUM.Configuration(
            applicationID: appID,
            sessionSampleRate: 50
        )
        
        RUM.enable(
          with: RUM.Configuration(
            applicationID: appID,
            uiKitViewsPredicate: DefaultUIKitRUMViewsPredicate(),
            uiKitActionsPredicate: DefaultUIKitRUMActionsPredicate(),
            urlSessionTracking: RUM.Configuration.URLSessionTracking()
          )
        )
        CrashReporting.enable()
        return true
    }
}

@main
struct DataDogAppApp: App {
    @UIApplicationDelegateAdaptor var appDelegate: AppDelegate
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

SDK logs

No response

Expected behavior

DynamicCodingKey is far less resource hungry threading wise.

Affected SDK versions

2.24.0

Latest working SDK version

Unsure

Did you confirm if the latest SDK version fixes the bug?

Yes

Integration Methods

SPM

Xcode Version

16.1

Swift Version

5.9 + 6

MacOS Version

15.3.1

Deployment Target

iPhone + iPad

Device Information

No response

Other relevant information

No response

blevasseur-block avatar Mar 24 '25 19:03 blevasseur-block

Hello @blevasseur-block πŸ‘‹. Thanks for opening this issue and sharing so many great insights β€” really appreciated!

I was able to reproduce the problem on our end as well, so we’re aligned there πŸ‘.

On the Suggested Fix:

  • Optimize the DynamicCodingKey struct to avoid unnecessary iterations and type re-casting.
  • Reduce reliance on Any and leverage type persistence to enhance decoding efficiency.

This is exactly the approach we aligned on internally some time ago. Your feedback reinforces that direction, and we’re bumping up the priority as a result πŸ™Œ. Mid-term plan is to get rid of AnyCodable use in CrashContext and long-term direction is to drop baggage() type-erasure from cross-product communication (like RUM <> Crash Reporting) and replace it with strongly typed values.

In fact, we already started on this last year. The plan closely follows your suggestion. To quote myself:

Moving also the DDCrashReport which unlocks the performance optimisation that we have planned for RUM-2971 as a follow up action from SDK latency investigation. By sharing the model definition between features, there is no longer a need for coding it through message-bus "baggage".

Thanks again β€” your input is helping shape our roadmap!

ncreated avatar Mar 31 '25 13:03 ncreated

Hi there, iOS crash reporting via Datadog is something of a superpower relied on by my team for operational metrics, and digging into session behavior.

What's the latest here? Thanks for digging in

kwigginton avatar May 13 '25 19:05 kwigginton

Hey @kwigginton πŸ‘‹. We’re actively working on performance improvements related to this issue. The most critical part reported here has already been addressed, and the fix will be included in the next release. Additional performance optimizations will be rolled out gradually in subsequent releases.

The core fix, which resolves the slow crash context decoding, has already been merged in #2276. In parallel, we're also optimizing cross-product communication throughout the SDK, with ongoing efforts tracked in #2294, #2293, #2290, and others.

ncreated avatar May 14 '25 07:05 ncreated

Hey @blevasseur-block , @kwigginton πŸ‘‹

We have released 2.28.0 which includes the perf improvement related to Feature-communication in the SDK. You should no longer experience overhead due to type-erasure nor decoding error, we have removed these complications.

Thanks again for the detailed report and insights!

maxep avatar May 27 '25 09:05 maxep