App
App copied to clipboard
[HOLD #53826] Not receiving realtime updates to desktop/web session
If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!
Version Number: Reproducible in staging?: Needs Reproduction Reproducible in production?: Needs Reproduction If this was caught on HybridApp, is this reproducible on New Expensify Standalone?: If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Expensify/Expensify Issue URL: Issue reported by: @quinthar Slack conversation (hyperlinked to channel name): ts_external_expensify_quality
Action Performed:
- Login to staging.new.expensify.com as user A
- As user B send messages to user A
Expected Result:
User A receives message in real time
Actual Result:
For user A typing indicator displayed, not receiving realtime updates to desktop/web session, but receiving push notifications in mobile for the same
Workaround:
Can the user still use Expensify without this being fixed? Have you informed them of the workaround?
Platforms:
Which of our officially supported platforms is this issue occurring on?
- [ ] Android: Standalone
- [ ] Android: HybridApp
- [ ] Android: mWeb Chrome
- [ ] iOS: Standalone
- [ ] iOS: HybridApp
- [ ] iOS: mWeb Safari
- [x] MacOS: Chrome / Safari
- [ ] MacOS: Desktop
Screenshots/Videos
Add any screenshot/video evidence
https://github.com/user-attachments/assets/7b474301-da0e-40c3-b26a-188569db9537
Issue Owner
Current Issue Owner: @deetergp
Triggered auto assignment to @deetergp (AutoAssignerNewDotQuality)
Triggered auto assignment to @trjExpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.
This has been labelled "Needs Reproduction". Follow the steps here: https://stackoverflowteams.com/c/expensify/questions/16989
@deetergp I'm assuming notification issues like this need to remain internal, but let me know if you don't think so and we can ask a C+ to get involved as a next step to try and reproduce.
I seemingly can't repro this myself. Question from the thread is: "Why isn't the ping/ping detecting and fixing this?"
This happened again; I can't figure out how to reproduce reliably though.
@deetergp, @trjExpensify Huh... This is 4 days overdue. Who can take care of this?
@deetergp thoughts on the above, will you be able to look at this today?
CC: @muttmuure I think this one is in the CRITICAL category for #quality, so I've moved it there.
Great!
On Mon, 18 Nov 2024 at 11:35, Tom Rhys Jones @.***> wrote:
@deetergp https://github.com/deetergp thoughts on the above, will you be able to look at this today?
CC: @muttmuure https://github.com/muttmuure I think this one is in the CRITICAL category for #quality, so I've moved it there.
— Reply to this email directly, view it on GitHub https://github.com/Expensify/App/issues/52437#issuecomment-2482786721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGRREHAPCVDK6DGO5NCLVT2BHGI7AVCNFSM6AAAAABRVGUUOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBSG44DMNZSGE . You are receiving this because you were mentioned.Message ID: @.***>
@trjExpensify I've spent a bit of time with this today and I also cannot seem to reproduce it. I've been having a protracted conversation between the ExpensiScotts (-fy.com & -fail.com) in splitscreen browser windows and they both come through fine. I'm looking at DM chat between DB & Kadie to see if there's anything "off" about what's in Auth and in the logs.
Gotcha. I'm sure DB would be happy to live debug or something, if you want to take it to the thread: https://expensify.slack.com/archives/C05LX9D6E07/p1731449676200089?thread_ts=1731299637.345689&cid=C05LX9D6E07
@deetergp, @trjExpensify Whoops! This issue is 2 days overdue. Let's get this updated quick!
Spent a bit of time looking into this today and it interesting. A log search for blob:"PusherError" returns tens of thousands of results for just the last 24 hours. They all have the 1006 error code which Pusher's documentation has this to say about it:
When a WebSocket connection is closed without a "close frame", the pusher-js library emits an error with code 1006. Usually this is caused by WebSocket-incompatible proxies, which can't close the connection in the correct way.
Looking specifically into @quinthar's logs, I see an interesting 1006 log line that pops up: Software caused connection abort. Between my own searching and ChatGPT, it sounds like poor network connectivity can be a culprit, as can "Version or Library Mismatch". I found some GH issues from 2021 that talk about needing to be on the latest (for the time) version of 9.x. Looking in our package.json file, it looks like we are on v 8.3.0. Maybe we need to update the version of the pusher client we are using?
I'm not sure how involved updating to a newer version might be, maybe @mountiny or @AndrewGable might have some insight?
@deetergp @trjExpensify this issue was created 2 weeks ago. Are we close to a solution? Let's make sure we're treating this as a top priority. Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
@deetergp I dont know the specifics it would involve to update the pusher, but here is a PR when we did it last time and seems like it was fine without any specific testing and it was fine. So I would check if there are any specific breaking changes that should worry us and try to update it. However, we are already on the latest officially stable version 8.3.0 https://www.npmjs.com/package/pusher-js?activeTab=versions the next version 8.4.0 is still a release candidate.
Hmm… Maybe I'm confusing versions of other things. @quinthar Does this happen when you're using a poor connectivity setting in Dev Tools? Just trying to narrow down possible causes…
@deetergp, @trjExpensify Eep! 4 days overdue now. Issues have feelings too...
No I never use that setting. However, I do often work on poor networks.
More discussion happening here and it sounds like we are leaning toward carving out a mini project to add our own Pusher ping to ensure the channel is open.
@deetergp, @trjExpensify Huh... This is 4 days overdue. Who can take care of this?
More discussion happening here and it sounds like we are leaning toward carving out a mini project to add our own Pusher ping to ensure the channel is open.
That sounds like a good idea! Will you create a new tracking issue for that, or use this one?
That sounds like a good idea! Will you create a new tracking issue for that, or use this one?
I think maybe a new tracking issue might be the right call. I'll make a new one and close this one in favor of that.
Going on hold until we have an application layer PING dedicated to Pusher. When we do, we can either fix actionable connection failures we find, or close this because a consistent application layer PING keeps pusher working well.
@deetergp before I add AutoAssignerNewDotQuality to the issue this is held on, do you want to be assigned to that one too since you've already one done research?
@mallenexpensify @muttmuure Let's put this one out there and see if someone else wants it. I've got Vacation Delegates and Improving System Messages in my immediate future, but if this is still out there when I get those in a happy place, I'll snag it.
I'm going to take this and begin looking into it.
Daily Update
I looked into some of these logs, and I found that the logs for [email protected] are actually quite abnormal when it comes to the 1006 error. I compared them with some other accounts ([email protected] logs and also [email protected] logs).
David's logs contain error messages from Pusher like:
- Read error: ssl=0xb400007bf6708a18: I/O error during system call, Software caused connection abort
- Software caused connection abort
- Connection interrupted (undefined)
These errors are virtually non-existent from the other two accounts.
I have opened up a support case with Pusher to see if they can help diagnose and understand these errors better and see if we can find a fix.
Next Steps
- @tgolen Work with Pusher support to understand the errors more
- @tgolen begin investigating the abnormal errors to see what other users are experiencing them
ETA
- TBD
Daily Update
- I haven't gotten any response from Pusher
- I submitted another support request to them, this time from infra@expensify, which is the email attached to the account (yesterday I had used my own email)
Contents of Email
Hello,
Our Expensify application has users experiencing many connection errors that return with the error code 1006. I have been trying to debug these errors more and I am reaching out for your help and guidance.
I looked at our logs for three users:
The first two accounts have normal looking logs. When the 1006 error happens, there is no "message" property in the error and their clients are able to reconnect right away.
The last account has abnormal looking logs. The 1006 error has a "message" property with things like:
- Read error: ssl=0xb400007bf66e7358: I/O error during system call, Software caused connection abort
- Connection interrupted (undefined)
- Software caused connection abort
These messages are virtually non-existing in the first two accounts, but it's not the only account they are happening on. When these errors happen, the client is NOT able to reconnect to pusher and they stay disconnected and aren't able to receive any more pusher messages.
I have gathered the logs for the last couple of days and put them into this Google doc which you can see for further analysis: https://docs.google.com/document/d/1TYUsq2IYaKiNTAWE89LH5Cmq2WLbBeRQ-3sCp--goZY/edit?tab=t.0
Can you please help me diagnose these messages further? Thank you, Tim
Next Steps
- @tgolen Wait for a reply from Pusher and try to work with them to understand the errors
ETA
- TBD
Daily Update
- I still haven't gotten any response from Pusher
- I found a contact form on their site for "Premium Support" and I filled out the contact form to see if that will wake someone up
Next Steps
- @tgolen Keep trying to engage with Pusher and contact them
ETA
- TBD
Reckon we won't hear back soon cuz of the holidays. I don't see an email in the cell for Relationship Owner (External) on the VML. Cole's the relationship owner, if we're still getting 👻 from Pusher, he can likely help light a 🔥 in the new year.
@tgolen, @trjExpensify Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!