postgres-nio
postgres-nio copied to clipboard
Constantly retrying/never failing request to unreachable database
Describe the issue
When using PostgresClient, making requests to a IP:port that isn't serving a postgres database will never throw an error and retry forever
Vapor version
Not using vapor, but PostgresKit is v2.13.5
Operating system and version
macOS 15.0 beta 1
Swift version
Swift 6 (from Xcode 16 beta 1)
Steps to reproduce
- Create an empty Swift package
- Add PostgresKit as a dependency to the package and the executableTarget
- Write the following code in the
main.swiftfile
import PostgresKit
let config = PostgresClient.Configuration(
host: "localhost",
port: 5432,
username: "my_username",
password: "my_password",
database: "my_database",
tls: .disable
)
let client = PostgresClient(configuration: config)
try await withThrowingTaskGroup(of: Void.self) { taskGroup in
taskGroup.addTask {
await client.run() // !important
}
try await client.withConnection { connection in
let db = connection.sql()
let row = try await db.raw("SELECT 1 as healthy").first()
print(try row?.decode(column: "healthy", as: Int.self))
}
}
- Run the project, make sure there is no Postgres server running and replying to
localhost:5432
Outcome
The db.raw call never returns, and the following log can be seen repeatedly in the console:
nw_endpoint_flow_failed_with_error [C37.1.2 127.0.0.1:5432 in_progress socket-flow (satisfied (Path is satisfied), viable, interface: lo0)] already failing, returning
nw_endpoint_flow_failed_with_error [C37.1.2 127.0.0.1:5432 cancelled socket-flow ((null))] already failing, returning
The expected behaviour would be that after a given configurable number of retries, the method would throw an error. As of now, there seem to be no way to detect a failing connection to the database.
Additional notes
My computer is an M1 Ultra. I run the latest macOS and Xcode beta and sadly I can't test with a stable version of macOS.
Probably related: https://github.com/vapor/postgres-nio/pull/487
@MahdiBM I've tried to pin postgres-nio to the merge commit of the PR mentioned above, and can confirm the issue is still here.
Here is the Package.swift file (the main.swift above remain unchanged):
import PackageDescription
let package = Package(
name: "PostgresClientRepro",
platforms: [.macOS(.v13)],
dependencies: [
.package(url: "https://github.com/vapor/postgres-kit.git", .upToNextMajor(from: "2.13.5")),
.package(url: "https://github.com/vapor/postgres-nio.git", branch: "f55caa7"),
],
targets: [
.executableTarget(
name: "PostgresClientRepro",
dependencies: [
.product(name: "PostgresKit", package: "postgres-kit"),
.product(name: "PostgresNIO", package: "postgres-nio"),
]
),
]
)
While writing this comment the query still hasn't returned (430+ seconds).
branch: "f55caa7"
Does it even work with that?!
Can you try "main" instead now that the PR is merged?
You could use revision: "SHA_HERE" but I hadn't seen using branch: with a SHA ... not sure if it's just me not knowing branch also accepts SHAs, or it's incorrect.
@abidon So there is a couple of things to unpack here:
- We don't enforce any timeouts in PostgresClient and we consider this a feature for PostgresNIO right now. Instead we hope that Swift introduces a timeout API like the one below. You can enforce timeouts yourself here:
func timeout<Clock: _Concurrency.Clock, Instant: InstantProtocol, Success: Sendable>(
deadline: Instant,
clock: Clock,
_ closure: @escaping @Sendable () async throws -> Success
) async throws -> Success where Clock.Instant == Instant {
let result = await withTaskGroup(of: TimeoutResult<Success>.self, returning: Result<Success, any Error>.self) { taskGroup in
taskGroup.addTask {
do {
try await clock.sleep(until: deadline, tolerance: nil)
return .deadlineHit
} catch {
return .deadlineCancelled
}
}
taskGroup.addTask {
do {
let success = try await closure()
return .workFinished(.success(success))
} catch let error {
return .workFinished(.failure(error))
}
}
var r: Swift.Result<Success, any Error>?
while let taskResult = await taskGroup.next() {
switch taskResult {
case .deadlineCancelled:
continue // loop
case .deadlineHit:
taskGroup.cancelAll()
case .workFinished(let result):
taskGroup.cancelAll()
r = result
}
}
return r!
}
return try result.get()
}
Then you can use it like this:
let clock = ContiniousClock()
try await timeout(deadline: clock.now + .seconds(10), clock: clock) {
let rows = try await client.query("SELECT *")
for try await row in rows {
}
}
- One issue in PostgresNIO that we currently have however is, that the above API approach would tell you that the request failed, since it got cancelled. So we should consider alternative error handling for timeouts.
Hope this helps.
Can you try "main" instead now that the PR is merged?
@MahdiBM: It was indeed the last commit hash on main, the one of the PR merge. And yeah, I always used branch with both tags, branches or commit hashes.
You could use revision: "SHA_HERE"
In fact I wasn't aware of revision 😅 Looks like we both learnt one thing 😊
Semantically speaking, revision sounds better, the fact SHA work with branch might just be a happy circumstance.
@fabianfett Thanks for the explanation and the code excerpt!
That being said, I find it slightly weird to overcome this specific problem with a timeout. The connection is not hanging, it's not even established.
Having to wait – let's say – 5 seconds even if we know that the connection cannot be made after a couple of milliseconds is just putting unnecessary pressure on the backend with long-awaiting requests.
On the other end, putting a deliberately small timeout to detect failing connections faster could cause false positives with timeouts occurring on a properly established connection.
Question: In addition to user-implemented timeouts (thanks to the code excerpt you provided), would it be reasonable to have something like a configurable maxAttempts: UInt? on PostgresClient.Configuration (which would default to nil to not break the actual behaviour of infinitely retrying) ?
From my understanding of the code, this maxAttempts could then be used here.
If you think this could be a good solution, I could attempt a PR. Any additional guidance and/or API design/naming would be appreciated.