discord-api-docs icon indicating copy to clipboard operation
discord-api-docs copied to clipboard

Random misleading Unknown Interaction errors

Open ImRodry opened this issue 2 years ago • 56 comments

Description

I've seen this issue reported by many people but so far no one has been able to gather enough information to reliably explain what's going on. An example can be seen at https://github.com/discordjs/discord.js/issues/7005 In summary, every now and then at a seemingly random chance it's possible that a bot's reply to an interaction fails due to an Unknown Interaction when, in reality, the reply succeeded and was shown to the user (by reply I mean a regular reply, deferred reply or update). I know this because I've been investigating this issue on a bot I manage for around a week now and I asked some users who were impacted by this. In the following screenshots I'm logging the time it took for me to reply by subtracting the current timestamp to the interaction's created_timestamp, and then logging the time it took for the bot to receive the error by subtracting the timestamp at the time the error was received to the one before the request was submitted. You can see that the reply is sent pretty fast and in time for Discord to accept it, however, the error comes 5 seconds later, indicating some sort of issue on Discord's end. image And of course I could be faking those numbers but it would make no sense for me to do that so I'm gonna have to ask you to trust that. I later asked the user impacted by this issue to see what the bot responded with, and they showed that the reply was indeed deferred, which means that that error was a false positive and everything worked fine on our end.

Steps to Reproduce

There are no steps to consistently reproduce this issue as it only happens randomly. What I can tell is that the error comes when the API takes too long to send the response back but actually acknowledges and processes it.

Expected Behavior

The reply is sent correctly (happening) and a success message is returned

Current Behavior

The reply is sent correctly but an "Unknown Interaction" error is thrown

Screenshots/Videos

Can only attach what I've shown above already image image (Bot is thinking but in Portuguese)

Client and System Information

discord.js v14.6.0 on Node v18.11.0 running on Debian 11 (bullseye)

ImRodry avatar Oct 17 '22 18:10 ImRodry

Can you provide a code snippet showing how these logs were generated? I'm curious as to whether it is all coming from one request or possibly retries. There are a number of interacting systems here, so additional information to help debug the issue would be beneficial.

DV8FromTheWorld avatar Oct 17 '22 19:10 DV8FromTheWorld

afaik discord.js only retries to submit requests when getting a 429 response which I assume not to be the case here on freshly created interactions, so there are no retries being done here to my knowledge image This is the first line that gets executed in the entire event that isn't an if statement and nothing above it interacts with the API other than this line. Hope this helps

ImRodry avatar Oct 17 '22 19:10 ImRodry

Keep in mind this issue can happen with other kinds of interaction replies, not only deferred messages (I tested with showing a modal). I only showed that snippet because it's the most basic one that should never generate that error

ImRodry avatar Oct 17 '22 19:10 ImRodry

Hey @DV8FromTheWorld do you have any updates on this?

ImRodry avatar Oct 22 '22 15:10 ImRodry

I have been getting this too, we check if it has been 3 seconds and it definitely has not at the time of request, but sometimes get this response. [email protected]

ooliver1 avatar Oct 22 '22 23:10 ooliver1

Yeah, I have been getting this error as well. And I haven't changed my code since updating Discord.js to v14.6.0.

kenyonbowers avatar Oct 23 '22 03:10 kenyonbowers

I have not looked deeper into this issue at this time. This is the first time I've heard of this issue. Before assuming it is a problem with Discord I would likely investigate the underlying library implementation.

For debugging purposes: Is there a way in your library (or tech stack) to track outbound network traffic? If there is, it would be useful to indicating whether the library is re-attempting a network call or if the initial network call is actually taking 5 seconds. From your code snippet that isn't possible to determine.

DV8FromTheWorld avatar Oct 24 '22 14:10 DV8FromTheWorld

I’m not sure if there is but I can dig into the source code and add that myself. I do, however, doubt that is the case, as we’ve seen @ooliver1 say they are experiencing the same behavior and they’re using a python library, which is completely different from the one I’m using

ImRodry avatar Oct 24 '22 17:10 ImRodry

It's pretty hard to reproduce confidently, since it's been random a lot of the time

ooliver1 avatar Oct 24 '22 17:10 ooliver1

I have found that letting it sit running for multiple hours after starting the bot allows it to not have that error until you turn the bot off and try to run it again without letting it sit.

kenyonbowers avatar Oct 24 '22 17:10 kenyonbowers

@DV8FromTheWorld I believe there is not much more debugging I can do here. Due to this issue happening at a random chance and requiring a high volume of interactions it would be impossible to gather enough data to be able to tell exactly why it's happening. All I can tell is that, on discord.js, after calling deferReply() the request is sent to this method which I am not familiar with and I would probably need to spend a lot of time figuring out all the quirks with this class and the whole package itself. I would, however, like to emphasize that I've seen people face this issue long before discord.js had this rest package, and also other people on other languages and libraries claim to be facing the same so could you look into this? If needed I can start gathering timestamps of when this issue happens and send them to you if that helps, I just can't log anything from the internal parts of the library unfortunately

ImRodry avatar Oct 30 '22 19:10 ImRodry

If it helps, I also get totally random, out of nowhere, “unknown interaction” errors in my bot logs [i run a bot using the discord.py library, so totally unrelated to the op who uses discord.js] when sending a response to an interaction. In my case, its just an immediate ephemeral response message [eg interaction.response.send_message(mymsghere, ephemeral=True) ], rather than a deference response with a use of the followup webhook.

I’ve never bothered trying to work out why it happens since the error traceback shows its more likely to be a Discord issue, rather than to do with anyones’ library implementations [unless every single lib dev has implemented interactions wrongly for 2 years lol]. Also, it’ll happen once, then never again for several days, usually when i’m sleeping [ie overnight] so its hardly something i can spend time debugging, since there’s no chance i’ll be able to find out why its happening.

SuperSajuuk avatar Oct 30 '22 23:10 SuperSajuuk

I'm facing some issues with showing Modal in my bit. The same code works 99% times but in some cases the interaction returns Unknown Interaction when trying to show the Modal. When I replace Modal with a Reply to the interaction, it works everytime. But as soon as I revert back to using Modal it starts failing again. This happens in certain buttons interactions set through certain slash command data. The issue persists even if I repost that post. But if I try posting it again with same data, the error persists.

muhitrhn avatar Dec 09 '22 21:12 muhitrhn

Have been getting random unknown interaction errors as well on deferReply() and showModal() rarely. Decided to check how much time each reply is taking (even though everything is deferred) using console.time() and console.timeEnd() and surprisingly that one day no errors occurred.

yash1441 avatar Dec 14 '22 02:12 yash1441

Would someone with this issue be willing to provide a complete, runnable code sample that reproduces this issue? Its very difficult to figure out if this is even a bug or not

yonilerner avatar Dec 21 '22 22:12 yonilerner

@yonilerner like I've said above, simply set up an event listener that all it does is either reply or defer the interaction it receives. Let that sit for a couple hours with a good amount of interactions coming through and you should see the error. There's no reproducible code sample because it really is random

ImRodry avatar Dec 21 '22 23:12 ImRodry

event listener that all it does is either reply or defer the interaction it receives. Let that sit for a couple hours with a good amount of interactions coming through and you should see the error. There's no reproducible__

Conklins avatar Dec 24 '22 00:12 Conklins

The problem here is that there isn't enough information here to actually debug anything. I recognize that people are occasionally receiving "Unknown Interaction", but that usually indicates a problem with the developer's code.

Personally, I would try capturing a variety of information:

  1. Capture network logs. i) Ensure there are no retries ii) Ensure the network request is actually being sent to discord, as opposed to being queued for # seconds due to some ratelimiting, and thus exceeding the timelimit
    • I throw this bullet point in because in the screenshots I'm seeing multi-second delayed until receiving an error which, to me, indicates a ratelimiter is holding things up. The fact that the error came from the "sequential requester" further makes me think that is at play
  2. Time the event was received
  3. Time the the event was supposedly responded to
  4. The time the request was actually sent by the network requester
  5. The type of event response (deferred reply, etc)
  6. Information about the internal ratelimiter to see if anything was triggered
  7. Generally I'd turn on any debug-logging around the network layer / requester

Unfortunately, until we have better concrete information with a timeline of events in a failed interaction request there isn't a ton we can do here.

DV8FromTheWorld avatar Jan 05 '23 20:01 DV8FromTheWorld

Alright thank you, I will try to get that information for you. Unfortunately it might not be very easy since my bot is using a package and it's hard to get that info from the package itself on prod, but I'll look into it

ImRodry avatar Jan 06 '23 00:01 ImRodry

For what it's worth, with the increasing number of times we've seen this, I decided to finally look into a bit. In djs there shouldn't be anything getting in the way of the request firing, but I am implementing a separate request handler to handle specifically interaction callbacks. While in theory this won't change the external facing behavior of the request, it at least should streamline the process and make it a little easier to debug.

ckohen avatar Jan 06 '23 04:01 ckohen

Been a few months here so I'm assuming the behavior isn't being seen anymore.

devsnek avatar Mar 07 '23 18:03 devsnek

Been a few months here so I'm assuming the behavior isn't being seen anymore.

oh no it definitely is, every single day, multiple times, I just don't have the time nor patience to debug things to the level you guys asked for

ImRodry avatar Mar 08 '23 00:03 ImRodry

Same here, has become a part of my life now.

muhitrhn avatar Mar 08 '23 05:03 muhitrhn

Been a few months here so I'm assuming the behavior isn't being seen anymore.

Still happens, even had it earlier today lol. Its just something I'm used to seeing at random now, and haven't really bothered to care about since there's no immediate impact to my bot. That being said, the source of what causes this problem needs to be resolved so people aren't confused by random misleading errors.

SuperSajuuk avatar Mar 08 '23 19:03 SuperSajuuk

I'm excited to find this issue! This has been very annoying the past few weeks. Some of my findings:

  • Like ImRodry has said: it can be fine for hours at a time and then will happen a bunch in a row. It's hard to reproduce, but it's happening. Repeating the same command as second time usually works fine.
  • I'm been using Discord.js 14.7.1, and now using 14.8 and can reproduce the error on my bot eventually with enough tries.
  • I moved my bot to a Linode with a dedicated CPU and 4GB of ram, still got the error. Unless it's an issue with Linode (a primary VPS provider), it's not a hardware issue or a shared CPU issue.
  • It doesn't matter if it's a slash command or a button. It just seems to pick an interaction response and fail to run it in time
  • In frustration of this issue, all my interactions now deferReply as soon as possible and it feels it doesn't matter how quickly the deferReply happens, it will sometimes just take longer than 3 seconds to respond
  • I added some logs to see how long each part of the code takes and it's all super fast until it will randomly hang on the deferReply. There are no promises running before the deferReply and you can see that it only takes a few ms to get to the deferReply

The architecture of my bot commands:

Example 1, this works fine: 2023-03-14 15:55:25.697 [INFO] [interactionCreate] interactionCreate event started at 1678827325696 2023-03-14 15:55:25.698 [INFO] [interactionCreate] Decided to run slash command in 1ms 2023-03-14 15:55:25.699 [INFO] [commandRun] commandRun started at 1678827325698 2023-03-14 15:55:25.700 [INFO] [commandRun] Executed the command in 1ms 2023-03-14 15:55:25.700 [INFO] [d.botstats] Command started at 1678827325700 2023-03-14 15:55:25.702 [INFO] [d.botstats] Attempting to defer reply... 2023-03-14 15:55:25.918 [INFO] [d.botstats] Reply deferred in 218ms`

Example 2, happened right after example 1. Note how it takes < 10 milliseconds to get to the point where it tries to defer reply, and then fails 2023-03-14 15:55:26.979 [INFO] [interactionCreate] interactionCreate event started at 1678827326973 2023-03-14 15:55:26.980 [INFO] [interactionCreate] Decided to run slash command in 6ms 2023-03-14 15:55:26.981 [INFO] [commandRun] commandRun started at 1678827326980 2023-03-14 15:55:26.981 [INFO] [commandRun] Executed the command in 1ms 2023-03-14 15:55:26.982 [INFO] [d.botstats] Command started at 1678827326981 2023-03-14 15:55:26.984 [INFO] [d.botstats] Attempting to defer reply... 2023-03-14 15:55:31.401 [INFO] [commandRun] ERROR: DiscordAPIError[10062]: Unknown interaction at SequentialHandler.runRequest (/usr/src/app/node_modules/@discordjs/rest/src/lib/handlers/SequentialHandler.ts:498:11) at runMicrotasks () at processTicksAndRejections (node:internal/process/task_queues:96:5) at async SequentialHandler.queueRequest (/usr/src/app/node_modules/@discordjs/rest/src/lib/handlers/SequentialHandler.ts:198:11) at async REST.request (/usr/src/app/node_modules/@discordjs/rest/src/lib/REST.ts:343:20) at async ChatInputCommandInteraction.deferReply (/usr/src/app/node_modules/discord.js/src/structures/interfaces/InteractionResponses.js:69:5) at async Object.execute (/usr/src/app/src/discord/commands/guild/d.botstats.ts:26:5) at async commandRun (/usr/src/app/src/discord/utils/commandRun.ts:37:5)

LunaUrsa avatar Mar 14 '23 21:03 LunaUrsa

I would try upgrading discord.js, there may be some bugfixes in newer versions that resolve this issue

yonilerner avatar Mar 14 '23 22:03 yonilerner

Thanks for the suggestion!

Discord.js 14.8, the latest version, was released on Sunday, two days ago. I hoped it would help, so I upgraded quickly, but it still happens a few dozen times daily. To clarify: when 14.8 was released I updated all my packages and this did not resolve the problem.

I have considered moving down to 13.14 but have yet to do that as it would be a lot more work, and I've not heard any guarantee that version doesn't have this issue. =/

I would move down to 13.14 if it were a sure shot because this error is highly annoying to users. Mod commands sometimes don't work on the first try, so it makes the entire bot seem unstable.

LunaUrsa avatar Mar 14 '23 23:03 LunaUrsa

As others have mentioned, this issue is happening across multiple libraries. Both d.js 14.8 and 13.14 should handle this exactly the same. The only notable way to stall an interaction callback at the moment is to have hit the global ratelimit (which is technically an implementation issue that never got updated), and even if you did, that would clear in no more than 1 second. cc @yonilerner

For @LunaUrsa, there were no fixes made relating to this issue in 14.8, though it would've been ideal to land that PR I mentioned earlier for it. We ended up getting really conflicting responses from devs on how "ratelimiting" works on the /callback endpoint so it stalled the PR for a while. At this point I think we are finally ready to move forward with it, so it should land in the next release, but unless you are hitting the global ratelimit it shouldn't actually affect you.

ckohen avatar Mar 15 '23 02:03 ckohen

In the interest of +1'ing this issue to highlight it is most definitely not library specific

I have encountered this in D.py, NAFF and interactions.py (rewrite and non). This is most assuredly an issue on discords side.

While I appreciate that it is an absolute nightmare to debug due to the infrequency and randomness of the error, it really shouldn't be brushed away as a library or network issue on the bot developers side.

The only reason this has little outcry is because it's infrequent, and our users just retry the command after it "fails", but obviously that's terrible ux

LordOfPolls avatar Mar 15 '23 10:03 LordOfPolls

My two cents to the conversation which I have tried to provide through other means to no luck, it seems like the underlying issue might be (educated guess) Discord taking too long to process the interactions at times, unsure of what that might be due to, as I cant debug there any further. The reason I say this because of logs I have from people using our library like the following one (note logs from an old version of the lib, I haven't contacted the person for new ones, but i have been told it keeps happening, rarely, but happening):

T 2023-01-13 01:52:39,600 hikari.gateway.2: dispatching INTERACTION_CREATE with seq 16296
T 2023-01-13 01:52:39,999 hikari.rest: f640d5af92e411ed85428e896c5c2a03 POST https://discord.com/api/v10/interactions/1063274348967886899/aW50ZXJhY3Rpb246MTA2MzI3NDM0ODk2Nzg4Njg5OTpZNXZtVHgwY25NYTI2bzF2VzlFcm9VbGhHYUF5Z1MxT2xYOGR1Y0MzRGx6WW85clNuSmp1Um1kYU01SlBWbHpWMnFIaVB1WG56bmtSbTFBNjY4VEs5TlpPTVV0cVk5ZTVkbzI4TmhYR0VaMkxkNW1nT2M3ZlFiWjBYdnZucjlOVA/callback
    User-Agent: DiscordBot (https://github.com/hikari-py/hikari, 2.0.0.dev115) Nekokatt AIOHTTP/3.8.1 CPython/3.10.9 Linux 64bit
 
    {'type': <ResponseType.MESSAGE_CREATE: 4>, 'data': {'embeds': [{'title': 'Stopwatch Started!', 'color': 11814356, 'footer': {'text': 'Note: stopwatch will stop after 1 day.'}}], 'allowed_mentions': {'parse': []}}}
T 2023-01-13 01:52:43,719 hikari.rest: f640d5af92e411ed85428e896c5c2a03 404 Not Found in 3719.914702931419ms
    Date: Fri, 13 Jan 2023 01:52:43 GMT
    Content-Type: application/json
    Transfer-Encoding: chunked
    Connection: keep-alive
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    Via: 1.1 google
    Alt-Svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
    CF-Cache-Status: DYNAMIC
    Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=0DHran7ZDY%2Fmf%2F1yZ6lS6PRfwaBRa6jcF61CffN1Re7AS91smDBuc1b12qftsF5I691eJ91iABum2CdkepDgU00BAmjPiD8DJJt57yxDtasX3tEsfVzspF6KHGJV"}],"group":"cf-nel","max_age":604800}
    NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
    X-Content-Type-Options: nosniff
    Set-Cookie: __cfruid=e11b5bb3826dc2e0d77c6aff187aec94fad6c899-1673574763; path=/; domain=.discord.com; HttpOnly; Secure; SameSite=None
    Server: cloudflare
    CF-RAY: 788a7e8009bb0e44-AMS
    Content-Encoding: gzip
 
    {"message": "Unknown interaction", "code": 10062}

Important things to note about the logs:

  1. The "dispatching" log is right after we receive the interaction (can also be checked by the time in the interaction ID: 1063274348967886899 => 2023-01-13, 01:52:39)
  2. The 3719.914702931419ms response time is round-trip time. This includes from making the request (after evaluation of bucket ratelimits, which are skipped for interactions anyways, so a NOOP) to receiving the response. The code can be found here
  3. A CF-RAY is provided in the response headers that could allow for further debugging, but these logs are months old and the info might not be stored anymore. I could try to ask for newer logs if it deemed necessary.
  4. This might also be due to random network delay, but I cant tell for sure unless the CF-RAY is looked at, as cloudflare should have all that info available. The average response time for this bot before and after these logs are around 500-700ms

davfsa avatar Mar 15 '23 11:03 davfsa