Crash on upgrade from 8.20.0 -> 8.25.0 via Span.finish
Platform
iOS
Environment
Develop
Installed
Swift Package Manager
Version
8.25.0
Did it work on previous versions?
8.20.0
Steps to Reproduce
- Install a version of the app that is using version 8.20.0 of the Sentry iOS SDK
- Update to a version of the app that is using version 8.25.0 of the Sentry iOS SDK
- Crash occurs on startup
Expected Result
We expect the crash not to occur
Actual Result
Few other notes:
- The span being finished is one that captures the boot time for all our internal systems at startup
- After the crash, if you restart the app on the 8.25.0 version there won't be any crash after that
Are you willing to submit a PR?
No response
Hey @nebsta,
I tried to reproduce the issue, but couldn't. I see that you have your own wrapper around SenrySpan. Can you share more details on how to initialize your instance of SentrySpan so we can reproduce the issue? A small sample project that reproduces the issue would even be the best. Furthermore, it would be great if you could share the crash report with us.
It seems like the culprit is #3892, which moved capturing the transaction to a background thread.
Hey @nebsta,
I tried to reproduce the issue, but couldn't. I see that you have your own wrapper around SenrySpan. Can you share more details on how to initialize your instance of SentrySpan so we can reproduce the issue? A small sample project that reproduces the issue would even be the best. Furthermore, it would be great if you could share the crash report with us.
It seems like the culprit is #3892, which moved capturing the transaction to a background thread.
@philipphofmann sorry for the late reply. Didn't get a notification about you responding. Here is a snippet of what the wrapper looks like. We mainly have this so that we can swap out reporting libraries in the future if we want to. As you can see it's pretty simple, nothing too fancy.
// Copyright © 2023 Trustpilot. All rights reserved.
import Foundation
import Sentry
class SentrySpan: SentrySpanProtocol {
private let sentrySpan: Span
init(span: Span) {
self.sentrySpan = span
}
func startChild(operation: String) -> SentrySpanProtocol {
let sentrySpan = self.sentrySpan.startChild(operation: operation)
return SentrySpan(span: sentrySpan)
}
func finish(status: TrustpilotSpanStatus) {
sentrySpan.finish(status: status.toSentrySpanStatus())
}
func finish() {
sentrySpan.finish()
}
}
extension TrustpilotSpanStatus {
func toSentrySpanStatus() -> SentrySpanStatus {
switch self {
case .internalError: return .internalError
}
}
}
// Copyright © 2023 Trustpilot. All rights reserved.
import Foundation
protocol SentrySpanProtocol {
func startChild(operation: String) -> SentrySpanProtocol
func finish(status: TrustpilotSpanStatus)
func finish()
}
enum TrustpilotSpanStatus {
case internalError
}
I'll see if I can get a full crash report from the sentry dashboard and post it here soon.
Thanks for the update, @nebsta. The full crash report would be helpful. I think I see a couple of occurrences of this in our internal SDK crash detection.
@philipphofmann classically I can't seem to replicate the issue anymore, so I'm unable to get the crash log for you. But by the sounds of it, it seems like it's a known issue if you have seen occurrences of it.
Yes, I think we should have enough information to fix it. Thanks for the update.
@philipphofmann any update on a fix?
@nebsta, I'm sorry no, not yet. We didn't get to this yet.
@philipphofmann are there any updates on this issue? We are currently blocked with upgrading to the latest sentry, as if we do, it will cause most (if not all) of our users to experience a crash on start up.
Hi @nebsta, still no updates. But we will keep this issue up to date.
Hey @krystofwoldrich @philipphofmann - Just to update on this, this issue is currently blocking us from upgrading to Xcode 16 as 8.20.0 won't compile with Xcode 16 and upgrading to 8.36.0 (the latest version) still causes this crash. Were you able to reproduce this issue?
Hello @P-Foley-Trustpilot can you share the error Xcode 16 gives with version 8.20.0?
Hi @brustolin, i've attached the errors list and where they occur
Oh yes. We have fixes for those missing references. And Im sorry, but we dont have any feedback on the Span issue, we couldn't reproduce it.
Oh yes. We have fixes for those missing references. And Im sorry, but we dont have any feedback on the Span issue, we couldn't reproduce it.
This seem to be at odds with what @philipphofmann was saying earlier in the thread. He mentioned that you are seeing these issues and that you have all the information you need to resolve it
philipphofmann on May 29 Thanks for the update, @nebsta. The full crash report would be helpful. I think I see a couple of occurrences of this in our internal SDK crash detection.
philipphofmann on May 29 Yes, I think we should have enough information to fix it. Thanks for the update.
I see. Yeah, maybe @philipphofmann knows something that I dont. Let me align with him.
We fixed the issue of the missing reference mentioned in https://github.com/getsentry/sentry-cocoa/issues/3977#issuecomment-2358020031 with https://github.com/getsentry/sentry-cocoa/issues/4050 shipped in 8.29.0.
Now, we only need to fix this error. I bumped the priority for this, but I can't give you an ETA yet.
We fixed the issue of the missing reference mentioned in #3977 (comment) with #4050 shipped in 8.29.0.
Now, we only need to fix this error. I bumped the priority for this, but I can't give you an ETA yet.
Thanks for the response @philipphofmann. We'll wait for an ETA from you 👍
@nebsta, unfortunately, all the crashes I see in our internal SDK crash detection are before 8.13.1, in which we fixed a crash in SentryTracer.finishInternal with #3333. I can't reproduce the issue, and I don't have a crash report. Therefore, I can't fix the underlying problem here. Do you have a repro or a crash report, @nebsta? Otherwise, I can't do anything at the moment.
@nebsta, unfortunately, all the crashes I see in our internal SDK crash detection are before 8.13.1, in which we fixed a crash in
SentryTracer.finishInternalwith #3333. I can't reproduce the issue, and I don't have a crash report. Therefore, I can't fix the underlying problem here. Do you have a repro or a crash report, @nebsta? Otherwise, I can't do anything at the moment.
Hey @philipphofmann interestingly, I can only seem to replicate the issue on a simulator, but not on device. If we upgrade from 8.20.0 -> 8.36.0, then we see the same exception as I posted originally on simulator, but it seems fine on device. This might just be due to the limited testing I'm doing. We are going to be doing more rigorous regression testing on our Swift 6/XCode 16 update, so we will report back if we can see the exception happening then on devices.
@nebsta, thanks for the update. Can you share a stacktrace of a crash on a simulator?
@nebsta, thanks for the update. Can you share a stacktrace of a crash on a simulator?
I can't seem to export the detailed stack trace. The best I can do is the screenshot that I originally attached to the ticket
@philipphofmann I had a thought the other day. Are the span operations thread safe? Because we're currently calling span.finish within an actor, so it's being executed in the threadpool and not on the main actor. Could that potentially be why we're seeing this problem?
Hello @nebsta, span operations are thread safe, finishing it outside the main actor should not cause a problem.
Do you have Sentry profiling enabled? If so, can you test without it?
Hello @nebsta, span operations are thread safe, finishing it outside the main actor should not cause a problem.
Do you have Sentry profiling enabled? If so, can you test without it?
Yeah we do, I can turn it off and see if that resolves the issue
@brustolin I tried setting the profiling sample rate to 0, and it doesn't seem to be happening anymore, though I also noticed before changing the same rate, that the exception was happening on my device via XCode even on a fresh install. Which is different to how we originally reported it, where it would only happen when we updated from an older version of sentry.
@brustolin @philipphofmann I did notice one thing that may be related to this issue. Would like to get your thoughts on it.
Basically we use the manually profiling transactions/spans for timing how long our bootstrapping process takes in the app. Our Bootstrapping system is broken up into steps, with each step being a single span within the overall bootstrap transaction.
I just noticed that there is actually a bug whereby if one of the bootstrapping items fails, we actually never finish the running span, or the running transaction. This would also makes sense, as I have also seen our profiling sometimes showing ridiculously large bootstrapping times, which this is probably the reason for.
The question is that if we don't finish a span/transaction, could there be issue here if we try to finish the same span/transaction when a previous one wasn't finished correctly?
The question is that if we don't finish a span/transaction, could there be issue here if we try to finish the same span/transaction when a previous one wasn't finished correctly?
Yes, that could cause problems, but he SDK shouldn't crash. If you can confirm that this fixes the problem, it's easier for us to investigate the root cause.
The question is that if we don't finish a span/transaction, could there be issue here if we try to finish the same span/transaction when a previous one wasn't finished correctly?
Finishing or refinishing a span should not matter; we have safeguards for these scenarios.
If you could provide us with a reproduction of your flow and how you use transactions/spans, it would help us a lot in identifying whether we have a bug in the said safeguards.
@brustolin I've outlined an explanation of how we're using them below, including some code examples as well. Hopefully that gives you a better picture.
The main class where we use the spans and where this exception is happening is called the Bootstrapper. This is an actor, which goes through a list of BootstrapItem instances (which are each themselves actor instances as well), and executes some setup work that we need to perform on the startup of our application.
You can see a full gist of what the Boostrapper looks like here, and what one of the BootstrapItem implementations look like here
If you look within the Bootstrapper.loadAll method, you'll see where we are creating/finishing all the spans for the bootstrap process. We start 1 span for the entire bootstrap process, and then spin off child spans for each item so that we can get a better breakdown of where our bottlenecks are.
let transaction = dependencies.sentry.startTransaction(name: "Bootstrap", operation: "Start") // HERE
for item in items {
let itemName = String(describing: type(of: item))
TLog.bootstrap.info("Loading item: \(itemName)")
dependencies.sentry.addBreadcrumb(BootstrapItemBreadcrumb.started(name: itemName))
var span: SentrySpanProtocol?
do {
span = transaction?.startChild(operation: itemName) // HERE
try await item.load()
try Task.checkCancellation()
dependencies.sentry.addBreadcrumb(BootstrapItemBreadcrumb.finished(name: itemName))
span?.finish()
} catch {
TLog.bootstrap.error("Error encountered on item: \(itemName), error: \(error)")
span?.finish(status: .internalError) // HERE
transaction?.finish() // HERE
dependencies.sentry.addBreadcrumb(BootstrapItemBreadcrumb.failed(name: itemName))
/// Some bootstrap items may not need to raise errors up to Sentry, so we log them here instead.
guard item.shouldRaiseFailureToSentry else {
TLog.bootstrap.error("Error encountered on item: \(itemName), error: \(error)")
return
}
throw BootstrapError.itemFailed(itemName: String(itemName), error: error)
}
}
transaction?.finish() // HERE
As I mentioned, there is a bug in this implementation, as if there is an exception in any of the bootstrap items, then we don't properly finish the span for the entire bootstrap process, which is something we're going to fix on our end. This is why I was wondering if perhaps this could be the reason for our crash, as I mentioned that we do sometimes see some instances in the 99th percentile that are reported to be going for as long as a couple of days, so I wonder if there is an issue in the mechanism for ensure that spans are always cleaned up.
Hopefully that's enough to give you a better picture.
I should also mention that the spans we're using the Bootstrapper are actually our own span class (which you can see here, which is just a light wrapper around the Sentry Span class. We do this so that our reporting manager is SDK agnostic, and so we can more easily write unit testing.