element-x-android icon indicating copy to clipboard operation
element-x-android copied to clipboard

"we couldn't reach this homeserver" is incredibly vague and hard to debug

Open anarcat opened this issue 8 months ago • 20 comments

Steps to reproduce

  1. i'm trying to login to matrix.debian.social with element x
  2. i follow the login process and enter the homeserver URL and fail

Outcome

What did you expect?

I expected to be able to login to the homeserver. It's running synapse 1.127.1 so it should have sliding sync (right?).

What happened instead?

Instead, it now fails on the MAS stuff, as it gets a 404 on /_matrix/client/unstable/org.matrix.msc2965/auth_metadata.

It seems to me this could be relayed back to the user. There are a couple of similar bug reports in this issue tracker, yet they all vary subtly:

  • #1097 is about missing sliding sync support (same error message!)
  • #3447 seems to be about proxy issues
  • #954 is similarly unclear but seemed to be about SSO

In generaly, percolate back to the user the actual error, including the HTTP status code and the URL it tried to reach. Telling users to "contact their homeserver admin" is not really useful, especially when the user is the homeserver admin. I only found out what the issue was by connecting to a separate IP (so i could find the traffic in the log) and grepping homeserver.log, which is not really intuitive.

I am also, fundamentally, puzzled that Element X already requires MAS given that this was deployed only last week on Matrix.org. Maybe I'm missing something?

Your phone model

irrelevant

Operating system version

Android 15 / CalyxOS

Application version and app store

element x from f-droid

Homeserver

1.127.1

Will you send logs?

Yes

Are you willing to provide a PR?

No

anarcat avatar Apr 08 '25 19:04 anarcat

Same on iOS - spend a few days, have no glue how to troubleshoot it at all

alexander-potemkin avatar Apr 20 '25 20:04 alexander-potemkin

As it's turned out, it could also mean that the server accept TLS 1.3, which is not supported: https://github.com/element-hq/element-x-ios/issues/786

alexander-potemkin avatar Apr 20 '25 21:04 alexander-potemkin

@anarcat

I am also, fundamentally, puzzled that Element X already requires MAS given that this was deployed only last week on Matrix.org. Maybe I'm missing something?

Yes you missed something.

https://matrix.debian.social/_matrix/client/v3/login Looks like this:

{
  "flows": [
    {
      "type": "m.login.sso",
      "identity_providers": [
        {
          "id": "oidc",
          "name": "Salsa",
          "brand": "salsa"
        }
      ]
    },
    {
      "type": "m.login.token"
    },
    {
      "type": "m.login.application_service"
    }
  ]
}

However Element X Android only supports these two methods:

  • Username / Password
  • Next Gen Auth provided currently with matrix-authentication-service, this is different from the Synapse's builtin SSO

But besides this, yes Element X Android can't communicate with user that this homeserver had this kind of unsupported configuration.

ShadowRZ avatar Apr 30 '25 04:04 ShadowRZ

This error must only happen to administrators who are testing their matrix backend setup before sharing it to their users. Normal users should not see it.

Any error message within a popup will always not be enough to debug the problem. If you see this message, tap seven times on the version number displayed on the first screen of the tap to display the bug report screen. From there, you will be able to extract the logs from the app to better target the issue.

For info, since this issue was created, ESS Community is now available to set up the whole backend stack in a few minutes. You can have a look at this talk made at Matrix Conference yesterday: Getting started with Element Server Suite Community .

manuroe avatar Oct 19 '25 12:10 manuroe

Any error message within a popup will always not be enough to debug the problem.

Probably it won't be 100% sufficient, but not displaying errors at all - doesn't sound like a nice user experience for me as well

If you see this message, tap seven times on the version number displayed on the first screen of the tap to display the bug report screen. From there, you will be able to extract the logs from the app to better target the issue.

I guess there is some typo here: "on the version number displayed on the first screen of the tap" - could you please, give a bit more details on where the tap shall happen?

Would that window accumulate all of the errors, all along with timestamps - to make some sense on them?

alexander-potemkin avatar Oct 20 '25 13:10 alexander-potemkin

This error must only happen to administrators who are testing their matrix backend setup before sharing it to their users. Normal users should not see it.

I don't understand this argument. Because we're administrators, we're not entitled to proper error messages that explain to us the proper way of going forward?

Any error message within a popup will always not be enough to debug the problem. If you see this message, tap seven times on the version number displayed on the first screen of the tap to display the bug report screen. From there, you will be able to extract the logs from the app to better target the issue.

Good to know, thanks! But this kind of obscure behavior is not helpful, in my opinion. It's nice that this kind of information is available, but why is it hidden behind what looks like an easter egg? What about just a "debugging information" link? Or just showing the debugging information right up? As you said, regular users are not supposed to see this anyways, so why tone it down?

For info, since this issue was created, ESS Community is now available to set up the whole backend stack in a few minutes.

That seems like a whole Matrix distribution. I suspect many operators will not want or be able to deploy this. And, at first glance, this is a Helm chart to be deployed with Kubernetes.

Are you saying that Kubernetes is now the only supported way to deploy Matrix? That escalated quickly, as they say! :)

anarcat avatar Oct 20 '25 13:10 anarcat

@manuroe

This error must only happen to administrators who are testing their matrix backend setup before sharing it to their users. Normal users should not see it.

I'd hope you to explain in detail in a timely manner, especially the must part to address what @anarcat said in https://github.com/element-hq/element-x-android/issues/4556#issuecomment-3422057523 , among others in that comment.

ShadowRZ avatar Oct 21 '25 05:10 ShadowRZ

We are building the app for users, not for server administrators. We want it to make it as simple as possible for them to meet the big players standard in the messaging area. It is not simple and we still have a lot to do. Matrix is powerful and offers much more with decentralisation, federation, e2ee, etc. Some of those concepts are still vague for users. It is why we focus more on the UX for users than for server admin UX. Every new button in the app is challenged.

Are you saying that Kubernetes is now the only supported way to deploy Matrix? That escalated quickly, as they say! :)

No, not at all. As app developers, we are also aware that the matrix stack got more complex to deploy. We are really pleased to have this ESS community that allows us to set up all the backend in one minute or two so that we can focus on the app dev.

Good to know, thanks! But this kind of obscure behavior is not helpful, in my opinion. It's nice that this kind of information is available, but why is it hidden behind what looks like an easter egg? What about just a "debugging information" link? Or just showing the debugging information right up? As you said, regular users are not supposed to see this anyways, so why tone it down?

It is really about not overflowing normal users with information they do not need. They do not need to be aware of (or scared by) the complexity behind.

I don't understand this argument. Because we're administrators, we're not entitled to proper error messages that explain to us the proper way of going forward?

As administrators, a log file will be more useful anyway. We should add a note in the readme about this 7-tap trick on the version number. The detailed data is available. It is more a matter to get it. Perhaps having it on a server installation guide will be even more pertinent for administrators as this case should be only observed by them. What guides are you using to install your matrix backend?

manuroe avatar Oct 21 '25 08:10 manuroe

I can't agree with none of those statements. But would it even matter?

ESS Community is Kubernetes and no - it won't make life easier, nor cheaper (DevOps guys are expensive).

No log files are not better - I don't need the whole set of errors or messages - I just need one thing that went wrong.

And no - normal users are not "overflowed" with something they don't understand - they just pass it over to the system administrators. Because system administrators / engineers can't have access to the mobile phones of users.

But, would it really matter? Who challenge the UI? Based on what? Who is the target user? Matrix.org users - who brings negative impact to the company as a business, or large customers - who bring some profit? Or it's Element engineers, based on they perspective?

alexander-potemkin avatar Oct 21 '25 11:10 alexander-potemkin

Feedbacks do matter. I am just trying to explain our focus for the app and set the expectations regarding this issue. Making the app good for users is already a challenge for a small team. We cannot run against every demand.

By closing this issue, I wanted to make transparent there is no plan to display more details about backend configuration issues within the app. There is already a way to extract logs from the app for technical investigations. Sadly, we did not communicated enough about it.

This issue can only happen when configuring the system. Am I so wrong thinking that admins should check the behavior end to end before sharing the app to their users? In terms of focus, do you really want us to focus on this problem instead of other UX issues?

I can understand it is boring to look at the logs but I cannot really see how the app could report the right info from those complex environments. App logs plus backend logs give the super power to admins.

manuroe avatar Oct 21 '25 12:10 manuroe

Am I so wrong thinking that admins should check the behavior end to end before sharing the app to their users?

You are not wrong. But how can we do the test? By being the users by our own. Unless there is some health-check pack, that I would be glad to run 24/7 against my servers.

In terms of focus, do you really want us to focus on this problem instead of other UX issues?

Personally I don't believe it's one thing vs the other. Element feels unpredictable, buggy and unclear by many. When you design the system as a clear and very stupid thing: do X -> get Y or Z (error or result) - it's much more predictable and easy to use, develop and support. As per my humble opinion, certainly.

I can understand it is boring to look at the logs but I cannot really see how the app could report the right info from those complex environments. App logs plus backend logs give the super power to admins.

It is not boring. It's time consuming - get your head around which log is about what. And a special bonus for the user, who managed to extract that and send it them to the admin.

Making the app good for users is already a challenge for a small team.

I do know! Making - anything at all - is always much more complicated than comment on it. I understand. Things I can't understand is:

  • why bringing in Kubernetes then - it's a behemoth that is rarely needed; and those who does - have they teams most of the time
  • why not choosing cross-platform toolkits / frameworks, and instead support: 2 iOS apps (Swift), 2 Android apps (Kotlin), JavaScript / TypeScript web & desktop app; Rust & Go & Python on back-end and a separate back-end - all of that - is a huge overhead, and it looks like none of those things (except for Synapse) is working stable enough; then add another component - Element Call, instead of making Jitsi calls just work (and retiring 1:1 WebRTC calls)
  • why ignoring PRs / leaving them hanging, mentioning that the app is in life-support stage only, and it still is for like a year now

So - no, I'm not saying it's all easy and you are doing it all wrong - obviously not so - you do have a working product, used by thousands at least. But I'm not sure I understand all of the decisions being done, if you ask me. And being on your place, I wouldn't ask anyone - just because you can't listen and make everyone happy :)

alexander-potemkin avatar Oct 21 '25 12:10 alexander-potemkin

It is not boring. It's time consuming - get your head around which log is about what. And a special bonus for the user, who managed to extract that and send it them to the admin.

I lost so much time with users trying to debug this this error does not make any sense for a user and for an admin it is very obscure. after trying other severs than the one I was deploying I concluded that Element X is not compatible with old servers. the error should say that. Users are very unhappy, and so am I.

Please improve the meaning of this error, so less people waste time on it.

Also, I had to use BrowserStack App Live to inspect the network requests and compare between servers. It showed that yes it communicated with my server by asking the clients json info. but for some obscure reason it does not go further.

williamdes avatar Oct 22 '25 08:10 williamdes

@manuroe

As administrators, a log file will be more useful anyway. We should add a note in the readme about this 7-tap trick on the version number. The detailed data is available. It is more a matter to get it.

I tried what you said, it opens the bugreport screen which, well, in the context doesn't help a lot, as the bugreport screen is mainly designed for sending logs to developers, however in case of the context about server connection troubleshooting, if EXA says we could't connect to this homeserver, personally I'd prefet it to allow me to dump some log files (or maybe open bugreport instead?) for further troubleshooting.

Also by note EXA certainly doesn't support some kinds of server configurations, namely legacy-SSO only setup without password fallback, due to some are unwilling to (or maybe just can't?) deploy MAS ( case in point: https://miruku.cafe/notes/adno7x7zkw ), Currently I don't see it communicated well.

Speaking of unsupported servers, I just tried using EXA to login to matrix.debian.social which at moment doesn't work with EXA, sadly using the trick you provided above to inspect logs, no details about the communication with the homeserver was found in logs, additionally I only found logcat.log which appears to contain no useful infomation that can be provided to server admins.

ShadowRZ avatar Oct 23 '25 01:10 ShadowRZ

Hi all, and the Administrator in the thread!

One of my users reported that they see this error message on their Android device but can log into the chat using a browser on a PC. The problem is still unsolved.

My server logs are empty, or I’m missing something that could hint at the root cause of the error.

The user can’t provide any details except this nonsense message:

We are building the app for users, not for server administrators.

I’m sorry, but how can an unsolvable issue help users use the app? My user sees a wall of useless text instead of something short like “HTTP error 404” or “SSL problem”. But they see “contact the administrator”.

  • Ok, the administrator is online, what’s the error you see?
  • I see “contact the administrator”.

I don't find this funny, btw.

Should I open a bug report related to the Android version of Element X? I’ll attach this “detailed” log message for you as a developer - is that OK?

My proposal is to have a “See details” button on the error dialog if you don’t want to expose all the technical details instantly, so that users could forward this info to an administrator.

P.S.: I would also be happy for any help in solving this particular case.

todo0123 avatar Oct 29 '25 12:10 todo0123

@todo0123 , in my case - I managed to troubleshoot the issue by enabling debug level log messages on the reverse proxy - that was(not) funny as well, but it helped me to narrow down the things down to the SSL level issues.

alexander-potemkin avatar Oct 29 '25 12:10 alexander-potemkin

Ok, the administrator is online, what’s the error you see? I see “contact the administrator”.

Nothing wrong here - Element is about facilitating people to communicate: why don't you add more reasons for the users to get in direct contact with administrators?

alexander-potemkin avatar Oct 29 '25 12:10 alexander-potemkin

Ok, the administrator is online, what’s the error you see? I see “contact the administrator”.

Nothing wrong here - Element is about facilitating people to communicate: why don't you add more reasons for the users to get in direct contact with administrators?

Nice joke. Moreover, I think it's better not to have Element X at all - to have more reasons to talk offline, yeah?

I installed the Element X server specifically to bypass my native country's censorship for my interlocutors. If I had the opportunity to chat and make calls with these people, I would never have installed Element.

These people are behind an NGFW that blocks even encrypted voice calls, most VPNs, etc.

Just imagine trying to set this up for your granny, and she has to find and send Android logs instead of telling you a simple message like "SSL error" (literally: I need to have all my family in one chat).

Reading Nginx debug logs is overkill when you just need one simple failure reason from the client.

As a bottom line, I think good messaging software shouldn’t push people to communicate outside the platform.

todo0123 avatar Oct 29 '25 16:10 todo0123

Nice joke.

Glad you liked it 😇

Reading Nginx debug logs is overkill when you just need one simple failure reason from the client.

It is a dirty hack, and it's Caddy logs in my case. But if you need to have things working, you do what you can... :(

As a bottom line, I think good messaging software shouldn’t push people to communicate outside the platform.

Agree. I'm not taking parts, but for the sake of truth, Matrix is the only protocol now that let people communicate in encrypted groups across devices and Element is the only client that implement most of the features (especially, if you are fine collecting them between both Element Classic and Element X and don't mind some hacks).

So... Things should, hopefully, change, but for now - Element is dragging the whole community into the enterprise like solutions; I guess, as a desperate attempt to compensate the fact, that some of the value created is grabbed by 3rd party vendors, leaving the company with not as much as investors / founders want.

alexander-potemkin avatar Oct 29 '25 22:10 alexander-potemkin

@manuroe I'd like you to respond to https://github.com/element-hq/element-x-android/issues/4556#issuecomment-3461263728 , especially:

We are building the app for users, not for server administrators.

I’m sorry, but how can an unsolvable issue help users use the app?

Also see https://github.com/element-hq/element-x-android/issues/5627

ShadowRZ avatar Oct 31 '25 01:10 ShadowRZ

So... Things should, hopefully, change, but for now - Element is dragging the whole community into the enterprise like solutions; I guess, as a desperate attempt to compensate the fact, that some of the value created is grabbed by 3rd party vendors, leaving the company with not as much as investors / founders want.

The picture is about right. We need to securise Element to be able to continue to improve Matrix. But the goal and the heart are still to bring Matrix to everyone. This is why we exposed ESS community. The stack may not suite to everyone but it is the stack Element uses in its pro product. We can share it without adding too much distractions to our small teams.

To come back to the issue, we compared of EXA and EXI behaviors on some servers shared here. EXI provided an additional bit of information in case of error. EXA also had an issue to resolve some domains with the need of https://. The two have been considered in https://github.com/element-hq/element-x-android/pull/5692.

I am reopening the issue but as an enhancement request, not as a defect.

There were good suggestions here like offering logs from the error popup to be shared with admin. But to make it really useful, we need to temporary increase the logs level during the authentication step. And to really finish the work, we need to add the admin details in this error popup as adverstised by GET /.well-known/matrix/support.

manuroe avatar Nov 06 '25 17:11 manuroe