ktgbotapi icon indicating copy to clipboard operation
ktgbotapi copied to clipboard

Improve fault resistance

Open BlackBaroness opened this issue 4 months ago • 15 comments

Hello there!

Looks like currently the bot is not as stable as it should be for an app running 24/7. I am not saying it's crashing or something, I mean it not always tries to connect again if something failed.

For example, I ran the bot with internet access disabled:

08:10:28 [SEVERE] Something web wrong
java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection
	at io.ktor.client.engine.cio.UtilsKt$readResponse$2.invokeSuspend(utils.kt:175)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:34)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:124)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)

08:10:28 [SEVERE] Exception in thread "DefaultDispatcher-worker-9" 
08:10:28 [SEVERE] dev.inmo.tgbotapi.bot.exceptions.CommonBotException: Something went wrong

08:10:28 [SEVERE] 	at dev.inmo.tgbotapi.bot.ktor.base.DefaultKtorRequestsExecutor.execute(DefaultKtorRequestsExecutor.kt:102)

08:10:28 [SEVERE] 	at dev.inmo.tgbotapi.bot.ktor.base.DefaultKtorRequestsExecutor$execute$1.invokeSuspend(DefaultKtorRequestsExecutor.kt)

08:10:28 [SEVERE] 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:34)

08:10:28 [SEVERE] 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:98)

08:10:28 [SEVERE] 	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:124)

08:10:28 [SEVERE] 	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)

08:10:28 [SEVERE] 	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)

08:10:28 [SEVERE] 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)

08:10:28 [SEVERE] 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)

08:10:28 [SEVERE] 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)

08:10:28 [SEVERE] 	Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@6e510c71, Dispatchers.IO]

08:10:28 [SEVERE] Caused by: java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection

08:10:28 [SEVERE] 	at io.ktor.client.engine.cio.UtilsKt$readResponse$2.invokeSuspend(utils.kt:175)

08:10:28 [SEVERE] 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:34)

08:10:28 [SEVERE] 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)

08:10:28 [SEVERE] 	... 6 more

After that error the library never tried again, even after 10 minutes of internet access enabled back.

Can you make it try to connect endlessly, please? Since it works somewhere in background, there is no easy way to detect the crash and restart the bot from the library user side

I believe it should try to reconnect until the library user will close it, possibly with an exponential delay (there is a HttpRequestRetry plugin with retryOnExceptionIf and exponentialDelay functions btw, maybe we don't need to implement all that by ourselves even)

BlackBaroness avatar Aug 02 '25 05:08 BlackBaroness

Welcome with your first issue

github-actions[bot] avatar Aug 02 '25 05:08 github-actions[bot]

Hi and thank you for your issue :) As for your stacktraces, I do not see where this may happen. I will try to investigate it, but in all my bots it works correctly. Besides, I see, that you have used CIO - in most cases it is better to use some other engine (like OkHttp)

InsanusMokrassar avatar Aug 03 '25 07:08 InsanusMokrassar

Please, try to use some different engine and check if it will work or not :) I will be glad with any news

InsanusMokrassar avatar Aug 03 '25 19:08 InsanusMokrassar

Please, try to use some different engine and check if it will work or not :) I will be glad with any news

Hi, looks like it kinda fixes it, now I get an exception what can be catched on my side

telegramBot(token) {
  client = HttpClient(OkHttp)
}

But it would be nice to have CIO supported somehow, since CIO is developed by the Ktor itself, and it is really non-blocking (okhttp uses a thread pool consuming real threads). There is also java engine using HttpClient from JDK which is non-blocking, but it's not that configurable. CIO looks like the most perspective engine now

I guess we can close that issue or repurpose it for supporting CIO if you think it's something that should be done

BlackBaroness avatar Aug 05 '25 00:08 BlackBaroness

Thanks for your response! ^^ I glad you have found some solution. I will try to investigate some ways of solution for this problem, but there is some chance that it can be fixed on Ktor team's side only

InsanusMokrassar avatar Sep 09 '25 20:09 InsanusMokrassar

Hi, @madhead :) I have put "custom" default engine clients in #1005 . You may try 28.0.3 in, I suppose, near 3 hours with default client and reply here if it have fixed issue. Thank you :)

InsanusMokrassar avatar Sep 23 '25 12:09 InsanusMokrassar

Hey, but I didn't have any issues with the clients :)

madhead avatar Sep 23 '25 13:09 madhead

Sorry, I clicked your username by mistake :(

@BlackBaroness I meant, could you, please, try 28.0.3 withiut passing custom client engine?:)

InsanusMokrassar avatar Sep 24 '25 04:09 InsanusMokrassar

@InsanusMokrassar hey, sorry, I think I missed a notification about your message

As far as I remember, the main problem was that if you turn off the Internet, the library will stop receiving updates - this was solved by simply receiving updates manually in my own cycle so I can fully control retries

I use that with CIO for weeks now and no issues so far, so CIO works good actually

BlackBaroness avatar Oct 11 '25 08:10 BlackBaroness

This code works rock solid with CIO:

        bot = telegramBot(token) { client = HttpClient(CIO) }
        scope.launch(Dispatchers.Default) {
            val allowedUpdates = listOf(UPDATE_MESSAGE, UPDATE_CALLBACK_QUERY)
            var lastUpdateId: UpdateId? = null
            while (isActive) {
                try {
                    val updates = bot.getUpdates(
                        offset = lastUpdateId?.let { it + 1 },
                        timeout = 30,
                        allowed_updates = allowedUpdates
                    )
                    updates.forEach { update -> /* handle */ }
                    lastUpdateId = updates.lastOrNull()?.updateId
                } catch (e: Throwable) {
                    if (e.isRelatedToCancellation || e.isRelatedToTimeout) continue
                    delay(5.seconds)
                }
            }
        }

BlackBaroness avatar Oct 11 '25 08:10 BlackBaroness

Hi :) its ok to miss some notifications (actually I am doing it all time). Thank you for detailed report :) I checked this scenario (losing internet) on custom bot of telegram bot api examples, so, there is actually one big problem there: it starts spamming with UnresolvedAddressException, but when internet came back - it completed its spamming and return to common work. Anyway, I am preparing 30.0.1, could you please check it when I will publish release? I may send you some direct message or email, if you are ok with it :)

InsanusMokrassar avatar Oct 21 '25 17:10 InsanusMokrassar

Anyway, I am preparing 30.0.1, could you please check it when I will publish release?

Ok, I will test losing internet connection on default long poll behavior after 30.0.1 release. Will free to DM me link if you need anything that's hard to talk about here in github

BlackBaroness avatar Oct 22 '25 02:10 BlackBaroness

Anyway, I am preparing 30.0.1, could you please check it when I will publish release?

Ok, I will test losing internet connection on default long poll behavior after 30.0.1 release. Will free to DM me link if you need anything that's hard to talk about here in github

Cool :) btw, you may try 30.0.1-branch_30.0.1-build3013 - it is preview version jf current PR (#1014 ) and I will test it today too

InsanusMokrassar avatar Oct 22 '25 05:10 InsanusMokrassar

I will try this week Just saying because I feel like you expected me to do that at same day, but i just cant

BlackBaroness avatar Oct 23 '25 06:10 BlackBaroness

Excuse me :( I just asked you IF you may and IF you wish, I totally understand that you don't have to do anything 👍 Anyway thank you for your help you already did, and I am sorry that I sound like I expect something urgent or bind, I am just a bit dumb in conversation

InsanusMokrassar avatar Oct 24 '25 08:10 InsanusMokrassar