Improve fault resistance
Hello there!
Looks like currently the bot is not as stable as it should be for an app running 24/7. I am not saying it's crashing or something, I mean it not always tries to connect again if something failed.
For example, I ran the bot with internet access disabled:
08:10:28 [SEVERE] Something web wrong
java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection
at io.ktor.client.engine.cio.UtilsKt$readResponse$2.invokeSuspend(utils.kt:175)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:34)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:124)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)
08:10:28 [SEVERE] Exception in thread "DefaultDispatcher-worker-9"
08:10:28 [SEVERE] dev.inmo.tgbotapi.bot.exceptions.CommonBotException: Something went wrong
08:10:28 [SEVERE] at dev.inmo.tgbotapi.bot.ktor.base.DefaultKtorRequestsExecutor.execute(DefaultKtorRequestsExecutor.kt:102)
08:10:28 [SEVERE] at dev.inmo.tgbotapi.bot.ktor.base.DefaultKtorRequestsExecutor$execute$1.invokeSuspend(DefaultKtorRequestsExecutor.kt)
08:10:28 [SEVERE] at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:34)
08:10:28 [SEVERE] at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:98)
08:10:28 [SEVERE] at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:124)
08:10:28 [SEVERE] at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)
08:10:28 [SEVERE] at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
08:10:28 [SEVERE] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)
08:10:28 [SEVERE] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)
08:10:28 [SEVERE] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)
08:10:28 [SEVERE] Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@6e510c71, Dispatchers.IO]
08:10:28 [SEVERE] Caused by: java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection
08:10:28 [SEVERE] at io.ktor.client.engine.cio.UtilsKt$readResponse$2.invokeSuspend(utils.kt:175)
08:10:28 [SEVERE] at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:34)
08:10:28 [SEVERE] at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
08:10:28 [SEVERE] ... 6 more
After that error the library never tried again, even after 10 minutes of internet access enabled back.
Can you make it try to connect endlessly, please? Since it works somewhere in background, there is no easy way to detect the crash and restart the bot from the library user side
I believe it should try to reconnect until the library user will close it, possibly with an exponential delay (there is a HttpRequestRetry plugin with retryOnExceptionIf and exponentialDelay functions btw, maybe we don't need to implement all that by ourselves even)
Welcome with your first issue
Hi and thank you for your issue :) As for your stacktraces, I do not see where this may happen. I will try to investigate it, but in all my bots it works correctly. Besides, I see, that you have used CIO - in most cases it is better to use some other engine (like OkHttp)
Please, try to use some different engine and check if it will work or not :) I will be glad with any news
Please, try to use some different engine and check if it will work or not :) I will be glad with any news
Hi, looks like it kinda fixes it, now I get an exception what can be catched on my side
telegramBot(token) {
client = HttpClient(OkHttp)
}
But it would be nice to have CIO supported somehow, since CIO is developed by the Ktor itself, and it is really non-blocking (okhttp uses a thread pool consuming real threads). There is also java engine using HttpClient from JDK which is non-blocking, but it's not that configurable. CIO looks like the most perspective engine now
I guess we can close that issue or repurpose it for supporting CIO if you think it's something that should be done
Thanks for your response! ^^ I glad you have found some solution. I will try to investigate some ways of solution for this problem, but there is some chance that it can be fixed on Ktor team's side only
Hi, @madhead :) I have put "custom" default engine clients in #1005 . You may try 28.0.3 in, I suppose, near 3 hours with default client and reply here if it have fixed issue. Thank you :)
Hey, but I didn't have any issues with the clients :)
Sorry, I clicked your username by mistake :(
@BlackBaroness I meant, could you, please, try 28.0.3 withiut passing custom client engine?:)
@InsanusMokrassar hey, sorry, I think I missed a notification about your message
As far as I remember, the main problem was that if you turn off the Internet, the library will stop receiving updates - this was solved by simply receiving updates manually in my own cycle so I can fully control retries
I use that with CIO for weeks now and no issues so far, so CIO works good actually
This code works rock solid with CIO:
bot = telegramBot(token) { client = HttpClient(CIO) }
scope.launch(Dispatchers.Default) {
val allowedUpdates = listOf(UPDATE_MESSAGE, UPDATE_CALLBACK_QUERY)
var lastUpdateId: UpdateId? = null
while (isActive) {
try {
val updates = bot.getUpdates(
offset = lastUpdateId?.let { it + 1 },
timeout = 30,
allowed_updates = allowedUpdates
)
updates.forEach { update -> /* handle */ }
lastUpdateId = updates.lastOrNull()?.updateId
} catch (e: Throwable) {
if (e.isRelatedToCancellation || e.isRelatedToTimeout) continue
delay(5.seconds)
}
}
}
Hi :) its ok to miss some notifications (actually I am doing it all time). Thank you for detailed report :) I checked this scenario (losing internet) on custom bot of telegram bot api examples, so, there is actually one big problem there: it starts spamming with UnresolvedAddressException, but when internet came back - it completed its spamming and return to common work. Anyway, I am preparing 30.0.1, could you please check it when I will publish release? I may send you some direct message or email, if you are ok with it :)
Anyway, I am preparing
30.0.1, could you please check it when I will publish release?
Ok, I will test losing internet connection on default long poll behavior after 30.0.1 release. Will free to DM me link if you need anything that's hard to talk about here in github
Anyway, I am preparing
30.0.1, could you please check it when I will publish release?Ok, I will test losing internet connection on default long poll behavior after 30.0.1 release. Will free to DM me link if you need anything that's hard to talk about here in github
Cool :) btw, you may try 30.0.1-branch_30.0.1-build3013 - it is preview version jf current PR (#1014 ) and I will test it today too
I will try this week Just saying because I feel like you expected me to do that at same day, but i just cant
Excuse me :( I just asked you IF you may and IF you wish, I totally understand that you don't have to do anything 👍 Anyway thank you for your help you already did, and I am sorry that I sound like I expect something urgent or bind, I am just a bit dumb in conversation