It looks like `sendMany()` is not sending all notifications
Thank you for this great and easy to use library.
While validating our implementation, I stumbled upon an issue with sendMany().
It might be my misunderstanding, or a potential issue with your code.
Environment:
- macOS: 14.2.1 (latest)
- Node.js: 16.6.0
- apns2 lib: 11.4.0 (latest)
I have an array of 321 device tokens from the APNS Production server from an App Store released app.
The idea is that when sending notifications to these tokens via the APNS Sandbox (thus host: 'api.sandbox.push.apple.com' in ApnsClient()), that APNS will return an error for each token.
Original setup
Based on your documentation:
Add an error listener that increments a counter:
// Listen for any error:
client.on(Errors.error, (err) => {
console.error("APNS error:", err.reason, err.statusCode)
// Increment:
failCount++;
})
Create array with 321 notifications, for each of the device tokens:
// Create notification for each token:
const notifications = [];
for (const token of schedule.tokens) {
notifications.push(new Notification(token, alert))
}
console.log("notifications length: " + notifications.length)
Then sendMany() is used:
// Step 2: Send Push Note(s):
try {
await client.sendMany(notifications)
} catch (err) {
console.error("ERROR: " + err)
// Errors ar handled in error listeners.
}
console.log("Completed. Number of notifications failed: " + failCount)
However. after running the script, the 2 console logs do not match:
notifications length: 321 Completed. Number of notifications failed: 100
Also the log file exactly shows 100 times "APNS error: BadDeviceToken 400" from client.on(Errors.error).
I would expect 321 fails, because all notifications must fail at the sandbox server.
What is really strange is that there is no console log at all starting with "ERROR: ", which is added in the catch of sendMany().
Alternative: for loop with send()
Commented the entire try/catch with sendMany() and added a simple for loop with send():
// Step 2: Send Push Note(s):
// try {
// await client.sendMany(notifications)
// } catch (err) {
// console.error("ERROR: " + err)
// // Errors ar handled in error listeners.
// }
for (const notification of notifications) {
try {
await client.send(notification)
} catch (err) {
console.error("ERROR: " + err.reason)
}
}
console.log("Completed. Number of notifications failed: " + failCount)
When running the script now, it's understandably slower, since it waits for completion of every notification. However, the numbers are matching closer:
notifications length: 321 Completed. Number of notifications failed: 312
There is a difference of 9 left. Which I can explain as well, by comparing the console logs from client.on(Errors.error) and send().
Not every function is called once per token, as can be seen:
APNS error: BadDeviceToken 400 ERROR: BadDeviceToken ERROR: undefined APNS error: BadDeviceToken 400 ERROR: BadDeviceToken
An undefined error is reported in send(), but not in client.on(Errors.error). I later added console.log("APNS error with error: " + err) to log the entire error (see also my comment about retry behaviour):
APNS error with error: Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_REFUSED_STREAM APNS error with reason: undefined
When moving failCount++; from client.on(Errors.error) to the catch of send(), the number do match:
notifications length: 321 Completed. Number of notifications failed: 321
My questions:
- Shouldn't error
undefinednot properly be handled inclient.on(Errors.error)? - Why is no error at all shown in the catch of
sendMany()? - Why is
sendMany()not sending all notifications?
As added to the initial description, I've figured out that the undefined error is an Error object with code "ERR_HTTP2_STREAM_ERROR".
I also learnt that although send() failed with this stream error, the next call to send() for the next notification would restore the stream.
Hence I added an ugly work-around to retry send() upon this specific stream error, by repeating the same logic (except that the second catch doesn't retry anymore):
for (const notification of notifications) {
try {
await client.send(notification)
} catch (err) {
console.log("APNS error with reason: " + (err.reason !== undefined ? err.reason : err))
if (err.reason !== undefined) {
// Handle issues with Token:
if (err.reason == Errors.badDeviceToken || err.reason == Errors.unregistered) {
if (err.notification.deviceToken !== undefined) {
console.log(`APNS token '${err.notification.deviceToken}' failed or unregistered -> remove`)
// Add to array to delete token from MKCloud:
badDeviceTokens.push(err.notification.deviceToken)
}
}
// Increment:
failCount++
}
else if (err.code === 'ERR_HTTP2_STREAM_ERROR') {
console.warn("ERR_HTTP2_STREAM_ERROR -> Retry")
try {
await client.send(notification)
} catch (err) {
console.log("APNS error with reason: " + (err.reason !== undefined ? err.reason : err))
if (err.reason !== undefined) {
// Handle issues with Token:
if (err.reason == Errors.badDeviceToken || err.reason == Errors.unregistered) {
if (err.notification.deviceToken !== undefined) {
console.log(`APNS token '${err.notification.deviceToken}' failed or unregistered -> remove`)
// Add to array to delete token from MKCloud:
badDeviceTokens.push(err.notification.deviceToken)
}
}
}
else if (err.code === 'ERR_HTTP2_STREAM_ERROR') {
console.warn("ERR_HTTP2_STREAM_ERROR -> give up")
}
// Increment:
failCount++
}
}
}
}
console.log("Completed. Number of notifications failed: " + failCount)
I hope you can provide a better solution, with an auto-retry in both send() and sendMany() for these kind of underlying transport errors.
Thanks so much for the detailed issue. I'm looking into this today and will update you with findings.
I think sendMany needs some sort of concurrency control. Right now we will try and send 300+ notifications concurrently which will most likely overwhelm the shared connection. Retry logic seems appropriate too, I think that will be fairly straight forward to implement.
Thanks for your prompt response. Basically I reported 2 different topics in this issue, I'm happy to move the retry topic to a new issue for a better follow-up.
Here are some examples of other libs that use concurrency, those might give you a hint:
- https://github.com/Pr0Ger/PyAPNs2
- https://github.com/sideshow/apns2
Please note that the 300+ notifications was just a test batch. Our iOS app has hundreds of thousands of users, so the real deal will be serious.
I did a test today with ~6K tokens in the FOR loop. After a while, I noticed new connection errors:
APNS token '61946e80bcc39a335aeb821048f6b47555574ab4647fe175c134e6f97668dfb4' failed or unregistered -> remove
APNS error with reason: Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_REFUSED_STREAM
ERR_HTTP2_STREAM_ERROR -> Retry
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: BadDeviceToken
APNS token '7476f0bc63df0d51c0866a07775b01e0b2479d8835b04ffaf7a21329b88825bc' failed or unregistered -> remove
Although it seemed to recover, these logs were seen more and more often. In the end, after 18 minutes the script froze in this situation:
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: read ECONNRESET)
APNS error with reason: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: Client network socket disconnected before secure TLS connection was established)
This means that the for loop with send() can not be used as work-around for sendMany():
- It's incapable of sending 6K notifications, without connection issues
- 18 minutes for 6K notification is unacceptable; we need to send tens of thousands within an hour. Other libraries claim to deliver at least 4K per second.
@AndrewBarba Please let me know if you need any more information or specific tests from me. Happy to contribute.
I think I've found the answer on the 2 main questions:
- Why is
sendMany()not sending all notifications? - Why so many disconnects when using
send()?
Out of curiosity I searched for another library and found this one: https://github.com/parse-community/node-apn
I implemented the logic, applied their send() on an array of tokens and something similar occurred: after exactly 100 times BadDeviceToken it got connection issues.
After some search on the internet, it could refer to something with the credentials.
So I decided to change from sandbox to production.
That cleared all issues with their lib; send() only came back with a few Unregistered and did send hundreds of notifications per second. ✅
To be honest, I haven't checked your lib with production environment yet, since that other lib is now integrated in my code. Will checkout an old commit tomorrow and verify.
For your lib, still these questions remain:
- Shouldn't error
undefinednot properly be handled inclient.on(Errors.error)? - Why is no error at all shown in the catch of
sendMany()?
- I wouldn't expect an undefined error to trigger any event handlers but will check
sendManyis designed to never throw an error, but instead return array of the results. This might not be a great API design and part of me thinkssendManyjust shouldn't exist. If you're sending thousands of notification I wouldn't expect those to live in memory to begin with (most likely streaming out of a db) so send many is a bit weird. I think usingsend()and then controller concurrency in user space is probably a better use