Investigate `DocRoute` related crashes
How to test
-
deploy to prod (currently no other way to reproduce)
-
run doc-test
- typically has at least one 521 failure, perhaps run a couple of times if not
- three runs without errors nor crashes is strong evidence the problem is fixed
-
observe server restart rate
-
2.102.4 is OK
-
2.102.5 causes crashes, seems to be triggered by running doc-test (or just regular doc browsing)
-
doesn't look like the caching is the problem, crashes and doc-test failures persist when cache is disabled
-
Cannot reproduce locally when running server via Xcode and running doc-test rester command against the local server (with db snapshot from April 24)
-
2.102.9 is OK (3 doc-tests without error, no crashes)
- this release has the changes of 2.102.5 rolled back
Other things to try
- run the actual linux image
- try to figure out why we're not seeing stack traces
I managed to trigger a crash running the server locally in a docker container:
[ WARNING ] Abort.303: See Other [component: server, request-id: 3F3990CD-999A-42C9-A427-A66F3AFE2425]
[ INFO ] GET /simonbs/Runestone/0.5.0/documentation/runestone [component: server, request-id: AA9CC60E-6738-48AD-891B-53F00791F4C1]
Swift/StringGraphemeBreaking.swift:596: Fatal error: Unexpectedly found nil while unwrapping an Optional value
💣 Program crashed: System trap at 0x0000aaaab3fb8f88
Thread 0 "NIO-SGLTN-0-#7" crashed:
0 0x0000aaaab3fb8f88 closure #1 in closure #1 in closure #1 in _assertionFailure(_:_:file:line:flags:) + 236 in Run
1 0x0000aaaab3fb8d74 closure #1 in closure #1 in _assertionFailure(_:_:file:line:flags:) + 199 in Run
2 0x0000aaaab3fb8c64 closure #1 in _assertionFailure(_:_:file:line:flags:) + 327 in Run
3 0x0000aaaab3fb8834 _assertionFailure(_:_:file:line:flags:) + 175 in Run
4 0x0000aaaab40e0cc4 _StringGuts._opaqueCharacterStride(startingAt:) + 707 in Run
5 0x0000aaaab40dfba4 String.count.getter + 235 in Run
6 specialized StringProtocol<>.sqlkit_replacing<A, B>(_:with:) + 111 in Run at .build/checkouts/sql-kit/Sources/SQLKit/Utilities/StringHandling.swift:31:52
7 SQLIdentifier.serialize(to:) + 259 in Run at /build/.build/checkouts/sql-kit/Sources/SQLKit/Expressions/Syntax/SQLIdentifier.swift:42:56
8 SQLStatement.serialize(to:) + 391 in Run at /build/.build/checkouts/sql-kit/Sources/SQLKit/Expressions/SQLStatement.swift:121:18
9 SQLAlias.serialize(to:) + 331 in Run at .build/checkouts/sql-kit/Sources/SQLKit/Expressions/SQLStatement.swift:52:13
10 SQLList.serialize(to:) + 111 in Run at /build/.build/checkouts/sql-kit/Sources/SQLKit/Expressions/Syntax/SQLList.swift:39:22
11 SQLStatement.serialize(to:) + 391 in Run at /build/.build/checkouts/sql-kit/Sources/SQLKit/Expressions/SQLStatement.swift:121:18
12 SQLSerializer.statement(_:) in Run at .build/checkouts/sql-kit/Sources/SQLKit/Expressions/SQLStatement.swift:52:13
13 SQLSelect.serialize(to:) in Run at /build/.build/checkouts/sql-kit/Sources/SQLKit/Expressions/Queries/SQLSelect.swift:99:20
14 SQLDatabase.serialize(_:) + 123 in Run at /build/.build/checkouts/sql-kit/Sources/SQLKit/Database/SQLDatabase.swift:176:20
15 _PostgresSQLDatabase.execute(sql:_:) + 119 in Run at /build/.build/checkouts/postgres-kit/Sources/PostgresKit/PostgresDatabase+SQL.swift:59:33
16 _FluentPostgresDatabase.execute(sql:_:) in Run at /build/.build/checkouts/fluent-postgres-driver/Sources/FluentPostgresDriver/FluentPostgresDatabase.swift:115:23
17 _FluentPostgresDatabase.execute(query:onOutput:) + 371 in Run at /build/.build/checkouts/fluent-postgres-driver/Sources/FluentPostgresDriver/FluentPostgresDatabase.swift:29:21
18 QueryBuilder.run(_:) + 1479 in Run at /build/.build/checkouts/fluent-kit/Sources/FluentKit/Query/Builder/QueryBuilder.swift:338:34
19 QueryBuilder.all(_:) + 167 in Run at /build/.build/checkouts/fluent-kit/Sources/FluentKit/Query/Builder/QueryBuilder.swift:255:25
20 QueryBuilder.all() + 179 in Run at /build/.build/checkouts/fluent-kit/Sources/FluentKit/Query/Builder/QueryBuilder.swift:231:14
21 specialized JoinedQueryBuilder.all() + 35 in Run at Sources/App/Core/Query+Support/JoinedQueryBuilder.swift:133:32
22 specialized static DocumentationMetadata.query(on:owner:repository:) in Run at /build/Sources/App/Core/DocumentationVersion.swift:44
23 specialized static PackageController.documentation(req:route:) in Run at /build/Sources/App/Controllers/PackageController+routes.swift:59
24 closure #5 in docRoutes(_:) in Run at /build/Sources/App/routes+documentation.swift:37
25 closure #1 in RoutesBuilder.on<A>(_:_:body:use:) in Run at /build/.build/checkouts/vapor/Sources/Vapor/Concurrency/RoutesBuilder+Concurrency.swift:148
26 0x0000aaaab3cfaa38 thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A)specialized partial apply in Run
27 0x0000aaaab42cb5c8 completeTaskWithClosure(swift::AsyncContext*, swift::SwiftError*) in Run
Backtrace took 9.48s
Press space to interact, D to debug, or any other key to quit (22s)
The way this is triggered is by running our doc-test:
env base_url=http://localhost:8080 rester restfiles/doc-test.restfile
A single run through all our 134 tests seems to do.
What's notable is that it neither crashes calling up the same doc url nor trigger the same stack trace:
[ WARNING ] Abort.303: See Other [component: server, request-id: B20BCF26-4C1C-4AEA-9AEC-34B8D62E15DD]
[ INFO ] GET /vtourraine/AcknowList/3.1.0/documentation/acknowlist [component: server, request-id: 258804E1-E570-45DC-8208-A88971D16B23]
swift-runtime: unable to suspend thread 434
swift-runtime: unable to suspend thread 436
swift-runtime: unable to suspend thread 418
swift-runtime: unable to suspend thread 434
swift-runtime: unable to suspend thread 436
💣 Swift runtime failure: Unexpectedly found nil while unwrapping an Optional value
Thread 0 "NIO-SGLTN-0-#1" crashed:
0 closure #1 in Socket.read(pointer:) in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/Socket.swift:220:72
1 specialized BaseSocket.withUnsafeHandle<A>(_:) in Run at .build/checkouts/swift-nio/Sources/NIOPosix/BaseSocket.swift:364:20
2 Socket.read(pointer:) + 180 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/Socket.swift:219:20
3 closure #1 in ByteBuffer.withMutableWritePointer(body:) + 23 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SocketChannel.swift:34:40
4 closure #1 in ByteBuffer.writeWithUnsafeMutableBytes(minimumWritableBytes:_:) in Run at /build/.build/checkouts/swift-nio/Sources/NIOCore/ByteBuffer-core.swift:662:74
5 closure #1 in BaseStreamSocketChannel.readFromSocket() + 71 in Run at .build/checkouts/swift-nio/Sources/NIOPosix/SocketChannel.swift:33:22
6 specialized closure #1 in ByteBuffer.modifyIfUniquelyOwned<A>(minimumCapacity:_:) + 199 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/PooledRecvBufferAllocator.swift:188:24
7 specialized ByteBuffer.modifyIfUniquelyOwned<A>(minimumCapacity:_:) in Run at .build/checkouts/swift-nio/Sources/NIOPosix/PooledRecvBufferAllocator.swift:186:25
8 specialized PooledRecvBufferAllocator.reuseExistingBuffer<A>(_:) + 135 in Run at .build/checkouts/swift-nio/Sources/NIOPosix/PooledRecvBufferAllocator.swift:107:46
9 specialized PooledRecvBufferAllocator.buffer<A>(allocator:_:) in Run at .build/checkouts/swift-nio/Sources/NIOPosix/PooledRecvBufferAllocator.swift:96:43
10 BaseStreamSocketChannel.readFromSocket() + 319 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/BaseStreamSocketChannel.swift:118:64
11 0x0000aaaad07a49b0 specialized BaseSocketChannel.readable0() + 43 in Run
12 specialized SelectableEventLoop.handleEvent<A>(_:channel:) + 403 in Run at .build/checkouts/swift-nio/Sources/NIOPosix/BaseSocketChannel.swift:1042:29
13 closure #2 in closure #2 in SelectableEventLoop.run() + 163 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:478:30
14 specialized Selector.whenReady0(strategy:onLoopBegin:_:) + 971 in Run at .build/checkouts/swift-nio/Sources/NIOPosix/SelectorEpoll.swift:252:25
15 specialized Selector.whenReady(strategy:onLoopBegin:_:) in Run at .build/checkouts/swift-nio/Sources/NIOPosix/SelectorGeneric.swift:288:18
16 SelectableEventLoop.run() + 491 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:470:36
17 specialized static MultiThreadedEventLoopGroup.runTheLoop(thread:parentGroup:canEventLoopBeShutdownIndividually:selectorFactory:initializer:_:) + 295 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:93:22
18 closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:) + 131 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:114:41
19 closure #1 in static ThreadOpsPosix.run(handle:args:detachThread:) + 311 in Run at /build/.build/checkouts/swift-nio/Sources/NIOPosix/ThreadPosix.swift:112:13
20 0x0000ffff8e15d5c8 <unknown> in libc.so.6
Backtrace took 7.65s
Ok, the plot thickens. I can actually reliably reproduce crashes on both macOS and Linux if I run the server in release mode. It crashes on neither platform if it's a debug build.
I just didn't see the crash on macOS earlier because I was running it via the normal (debug) Run target in Xcode.
The crashes when running in release on macOS seem to be "stable" in their location and backtrace. It always looks like this:
NIO-SGLTN-0-#1 (3)
#0 0x0000000195df06b4 in objc_release ()
#1 0x00000001a6212cf8 in swift_arrayDestroy ()
#2 0x00000001a5f4c218 in _DictionaryStorage.deinit ()
#3 0x00000001a5f4c294 in _DictionaryStorage.__deallocating_deinit ()
#4 0x00000001a622d1a0 in _swift_release_dealloc ()
#5 0x00000001a622dcf8 in bool swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1>>::doDecrementSlow<(swift::PerformDeinit)1>(swift::RefCountBitsT<(swift::RefCountInlinedness)1>, unsigned int) ()
#6 0x0000000100cc865c in destroy for Request.RequestBox ()
#7 0x00000001a5f6de64 in ManagedBuffer.deinit ()
#8 0x0000000100335930 in LockStorage.deinit [inlined] at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOConcurrencyHelpers/NIOLock.swift:159
#9 0x000000010033590c in LockStorage.__deallocating_deinit ()
#10 0x00000001a622d1a0 in _swift_release_dealloc ()
#11 0x00000001a622dcf8 in bool swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1>>::doDecrementSlow<(swift::PerformDeinit)1>(swift::RefCountBitsT<(swift::RefCountInlinedness)1>, unsigned int) ()
#12 0x0000000100cc7a08 in Request.deinit ()
#13 0x0000000100cc7a38 in Request.__deallocating_deinit ()
#14 0x00000001a622d1a0 in _swift_release_dealloc ()
#15 0x00000001a622dcf8 in bool swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1>>::doDecrementSlow<(swift::PerformDeinit)1>(swift::RefCountBitsT<(swift::RefCountInlinedness)1>, unsigned int) ()
#16 0x0000000100caf374 in ___lldb_unnamed_symbol1285567 ()
#17 0x00000001a622d1a0 in _swift_release_dealloc ()
#18 0x00000001a622dcf8 in bool swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1>>::doDecrementSlow<(swift::PerformDeinit)1>(swift::RefCountBitsT<(swift::RefCountInlinedness)1>, unsigned int) ()
#19 0x0000000100caf3e8 in ___lldb_unnamed_symbol1285568 ()
#20 0x00000001a622d1a0 in _swift_release_dealloc ()
#21 0x00000001a622dcf8 in bool swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1>>::doDecrementSlow<(swift::PerformDeinit)1>(swift::RefCountBitsT<(swift::RefCountInlinedness)1>, unsigned int) ()
#22 0x0000000100c52124 in specialized closure #1 in EventLoopPromise._resolve(value:) ()
#23 0x0000000100c5208c in partial apply for specialized closure #1 in EventLoopPromise._resolve(value:) ()
#24 0x0000000100404f30 in closure #4 in SelectableEventLoop.run() [inlined] at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:534
#25 0x0000000100404f20 in specialized closure #1 in withAutoReleasePool<τ_0_0>(_:) [inlined] at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:26
#26 0x0000000100404f20 in specialized autoreleasepool<τ_0_0>(invoking:) [inlined] ()
#27 0x0000000100404f20 in specialized withAutoReleasePool<τ_0_0>(_:) [inlined] at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:25
#28 0x0000000100404f20 in SelectableEventLoop.run() at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:531
#29 0x00000001003e9020 in specialized static MultiThreadedEventLoopGroup.runTheLoop(thread:parentGroup:canEventLoopBeShutdownIndividually:selectorFactory:initializer:_:) at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:93
#30 0x00000001003e6880 in specialized static MultiThreadedEventLoopGroup.runTheLoop(thread:parentGroup:canEventLoopBeShutdownIndividually:selectorFactory:initializer:_:) [inlined] ()
#31 0x00000001003e6858 in closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:) at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:114
#32 0x00000001003ea9d4 in partial apply for closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:) ()
#33 0x00000001003ed21c in thunk for @escaping @callee_guaranteed (@guaranteed NIOThread) -> () ()
#34 0x00000001004200d4 in closure #1 in static ThreadOpsPosix.run(handle:args:detachThread:) at /Users/sas/Library/Developer/Xcode/DerivedData/spi-server-gsbjdobzvfgnckdblpfcxbfdtnun/SourcePackages/checkouts/swift-nio/Sources/NIOPosix/ThreadPosix.swift:112
#35 0x0000000105e2d5c0 in _pthread_start ()
and is happening within LockStorage.deinit.
(We're introducing a cache in the code changes that trigger this issue and it is using a lock internally. The crashes persist even if I remove the cache entirely, and the crash location does not change at all when the caching dependency is removed.)
swift-nio version is 2.62.0
That's a 99.99% a heap smasher. Given that the stack is unstable it's likely to be a use after free. And the vast vast vast majority of use after frees in Swift are caused by threading bugs somewhere. The stack that is catching the heap smasher is usually not the one that has caused it at all.
Thanks, Johannes, that makes sense! I'm wondering, shouldn't Asan/Tsan be giving some pointers (hah!) as to where this might be happening?
For reference, the changes that are introducing the crash are in this PR: https://github.com/SwiftPackageIndex/SwiftPackageIndex-Server/pull/3036/files. Leaving the test changes aside, there's not a whole lot going on here and I have to wonder if this is even a problem intrinsic to these changes or they just happen to shuffle things sufficiently to make a problem that's already present elsewhere manifest itself.
Thanks, Johannes, that makes sense! I'm wondering, shouldn't Asan/Tsan be giving some pointers (hah!) as to where this might be happening?
They should help but it's not guaranteed unfortunately. They don't see everything in Swift.
I've reduced this to a reproducer that only involves Vapor and swift-algorithms:
-
Package.swift:
// swift-tools-version: 5.10
import PackageDescription
let package = Package(
name: "release-crash",
platforms: [.macOS(.v11)],
dependencies: [
.package(url: "https://github.com/apple/swift-algorithms.git", from: "1.2.0"),
.package(url: "https://github.com/vapor/vapor.git", from: "4.96.0"),
],
targets: [
.executableTarget(name: "release-crash", dependencies: [
.product(name: "Algorithms", package: "swift-algorithms"),
.product(name: "Vapor", package: "vapor")
]),
]
)
-
main.swift:
import Vapor
import Algorithms
import Vapor
let app = try Application(.detect())
defer { app.shutdown() }
// https://github.com/SwiftPackageIndex/SwiftPackageIndex-Server/issues/3021
// Trigger via
// curl -sL -w "%{http_code}" -o /dev/null http://localhost:8080/crash/1234567890
// NB: length of the archive parameter is relevant, it MUST be >= 10 characters long
app.get("crash", ":archive") { req in
let archive = req.parameters.get("archive")
let _ = ([archive].compacted() + req.parameters.getCatchall()).map { $0.lowercased() }
return ""
}
try app.run()
Run via (⚠️ -c release is important, it does not crash in debug mode):
swift run -c release
Trigger crash via
curl -sL -w "%{http_code}" -o /dev/null http://localhost:8080/crash/1234567890
Reproducer project: https://github.com/finestructure/release-crash-3021
Fixed and deployed.
Crash reported upstream: https://github.com/apple/swift/issues/73477