[flagd-ui] Spans with error
[flagd-ui] Spans with error
Changes
Switch flagd-ui navbar links from plain anchors to LiveView navigation (.link navigate) to avoid full page reloads and websocket teardown when switching between Basic and Advanced.
Fix LiveSocket URL construction under the /feature base path so the websocket connects to /feature/live (was /featurelive), preventing spurious disconnects.
Merge Requirements
For new features contributions, please make sure you have completed the following essential items:
- [x]
CHANGELOG.mdupdated to document new feature additions - [x] Appropriate documentation updates in the docs
- [x] Appropriate Helm chart updates in the helm-charts
Maintainers will not merge until the above have been completed. If you're unsure which docs need to be changed ping the @open-telemetry/demo-approvers.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: julianocosta89 / name: Juliano Costa (34f66abb2cfe781b7d0e575f972f1847ce863109)
@jack5341 thx for that!
I still see an error when accessing the flagd-ui service though:
This error happens every time I open /feature.
Did you guys faced something like that before in this repository?
Not only the CSS but also the JS files are unreachable, so I can’t make any function calls. Even when I switch to another branch, I still encounter the same problem. Yesterday, everything was working fine, this issue started today.
Additionaly I use Orbstack for docker.
My console
GET http://localhost:32935/feature/assets/css/app-6f5d86242cf5220b8531adc7351da8bc.css?vsn=d net::ERR_ABORTED 404 (Not Found)Understand this error
(index):10
GET http://localhost:32935/feature/assets/js/app-fb088dfe3c12b4ebb739348d1a2a3a57.js?vsn=d net::ERR_ABORTED 404 (Not Found)
Did you guys faced something like that before in this repository?
Not really, and today I was able to run your branch fine.
I had to access the /feature to get the traces with error, and everything worked fine
Did you guys faced something like that before in this repository?
Not really, and today I was able to run your branch fine. I had to access the
/featureto get the traces with error, and everything worked fine
you have any idea what can cause this problem?
To help other users see that issue, make sure the project is always initiated with make start command. I am on my way again.
Hey @julianocosta89, I’ve fixed it. The issue was caused by the protocol switch during WebSocket connections. HTTP 101 responses were being treated as errors, which is actually expected behavior for WebSockets. The simple and reliable fix was to add a filter to the otel-collector processor to set http.status_code 101 entries to UNSET.
update
hey @jack5341 now I'm not sure if we should fix this on the demo or if we should open an issue on the instrumentation repo. If 101 shouldn't be an error, the instrumentation is wrong and it should be fixed.
WDYT?!
I’d suggest fixing this in the instrumentation repo as well as here, since we don’t know when it will be released and when it might affect this repository.
In that case we should do the following:
- [ ] Open an issue in the Elixir instrumentation repo
- [ ] Add a comment with the link to the issue in the Collector rule here
- [ ] Limit the scope of the rule to only Flagd-UI spans (at the moment the rule is configured to update ALL spans with status code 101)
In that case we should do the following:
- [ ] Open an issue in the Elixir instrumentation repo
- [ ] Add a comment with the link to the issue in the Collector rule here
- [ ] Limit the scope of the rule to only Flagd-UI spans (at the moment the rule is configured to update ALL spans with status code 101)
1- Is this repository the correct one for Elixir? 2- Do you mean I should add a comment to the changes here and include a link to the issue created in the Elixir repository? 3- Why not keep it global? I mean, 101 shouldn’t really be treated as an error, am I wrong?
1- Is this repository the correct one for Elixir?
yes
2- Do you mean I should add a comment to the changes here and include a link to the issue created in the Elixir repository?
Yes, same as we have done here: https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/otel-collector/otelcol-config.yml#L150
3- Why not keep it global? I mean, 101 shouldn’t really be treated as an error, am I wrong?
Good question. I wonder if other instrumentations set the span status to error. If so then this is a specification problem.
I can take a further look next week on that.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
3- Why not keep it global? I mean, 101 shouldn’t really be treated as an error, am I wrong?
I've tried to find a reasoning behind this 101 being treated as an Error, but I couldn't find anything.
https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#what-constitutes-an-error
@tsloughter according to @jack5341 https://github.com/open-telemetry/opentelemetry-demo/pull/2677#issuecomment-3443325727:
The issue was caused by the protocol switch during WebSocket connections. HTTP 101 responses were being treated as errors, which is actually expected behavior for WebSockets.
Is there any reason why 101 would be an error in this context?
3- Why not keep it global? I mean, 101 shouldn’t really be treated as an error, am I wrong?
I've tried to find a reasoning behind this
101being treated as an Error, but I couldn't find anything. https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#what-constitutes-an-error@tsloughter according to @jack5341 #2677 (comment):
The issue was caused by the protocol switch during WebSocket connections. HTTP 101 responses were being treated as errors, which is actually expected behavior for WebSockets.
Is there any reason why
101would be an error in this context?
OpenTelemetry pipelines it may appear as one because of how certain instrumentation interprets “non-final responses” or “upgrade responses” during HTTP
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Bump.
The error span is actually from envoy, right? What is the current status on this? Something going wrong with how the Elixir service is closing or not closing the websocket?
I think the issue is that the flagd-ui is not actually closing the websocket in a timely manner.
This can be found in the network tab of the browser:
Envoy tags it as error because it didn't get a response from flagd-ui, so I still believe the error is on the Elixir side of things.
Do you know how it gets asked to close it? Is it supposed to time out after not having more flags requested for a period and close or is the client ending the connection?
Do you know how it gets asked to close it? Is it supposed to time out after not having more flags requested for a period and close or is the client ending the connection?
Not really, this is handled by Envoy, right?!
The websocket will stay open until a timeout or it is told to close, so wondering if this is a case of the client closing or a timeout expected to go off on the server side. Or envoy could have a timeout that it'll close websockets after.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.