Only a subset of mock requests appear in Live Traces (EKS + Istio + Microcks)
Hi,
Iβve deployed Microcks on EKS using my own custom Helm chart, but I made sure to include all the required configurations from the official Microcks Helm chart. My application is running inside an Istio service mesh. Everything appears to be working correctly overall.
However, Iβve noticed an issue with API Live Traces. When I call the mock URL, only 2β3 out of every 10 requests show up in the Live Traces UI. Is this expected behavior? If not, what could be causing this? I donβt see any errors or exceptions in the Microcks pod logs.
For context, I have 4 pods running in the Deployment behind a headless service.
π @karthikrajkkr
Welcome to the Microcks community! π
Thanks and congrats π for opening your first issue here! Be sure to follow the issue template or please update it accordingly.
π’ If you're using Microcks in your organization, please add your company name to this list. π It really helps the project to gain momentum and credibility. It's a small contribution back to the project with a big impact.
If you need to know why and how to add yourself to the list, please read the blog post "Join the Microcks Adopters list and Empower the vibrant open source Community π"
Hope you have a great time there!
π ~~~~~~~~~ π
π’ If you like Microcks, please β star β our repo to support it!
π It really helps the project to gain momentum and credibility. It's a small contribution back to the project with a big impact.
Hi @karthikrajkkr
This is a super interesting configuration. I think what you observe is due to the sampling setting of our OLTP tracing part. The default setting is as follows:
management.otlp.tracing.sampling.probability=0.2
which means that only 20% of detailed traces are actually sent to the OLTP collection. I think you may be able to override this by setting an environment variable like MANAGEMENT_OLTP_TRACING_SAMPLING_PROBABILITY to the value that suits you best.
Let me know if it solves this issue.
I think the problem is that the Live Traces events currently do not work in a distributed way. Each Microcks pod processes mock events locally in-memory, and the Live Traces UI only displays the events handled by the specific pod you're connected to. As youβre running 4 pods behind a headless service your requests are being spread across pods. As a result, only a fraction of your calls appear in the UI, depending on which pod your UI session is attached to at that moment.
I think there are multiple ways to handle this problem>
- We could either integrate the Live Traces pipeline with a distributed eventing system such as RabbitMQ or Kafka, allowing events to be centralized and consumed by all replicas.
- Or we could export all the traces to an opentelemetry backend and then make the microcks Live traces query from that opentelemetry backend. But using this strategy we would lose the instant feedback loop (as we would poll events instead of just receiving them)
Hello @Apoorva64 ,
Thank you for identifying the root cause. I believe the first option is the best approach. Do you have an idea to integrate this option in feature release versions?
Hey @karthikrajkkr and @Apoorva64
I apologize for my previous response, which I think I addressed too quickly. Thanks, @Apoorva64, for providing the correct answer.
I think that what you're suggesting here is beyond the initial scope of "Live Traces," which was to provide more insights for tuning and troubleshooting the dispatcher settings, as well as offering additional insights into how the mocking engine resolves incoming requests. It was primarily intended for use on a single local instance, where the cost and complexity of deploying a full OTEL (LGTM or others) can be too high.
Thus, I agree with @karthikrajkkr that not being able to see all the live traces in an environment with many replicas can be disturbing for end-users. But instead of going into other developments that - in my opinion - are beyond the scopes of Microcks, I would propose other options:
- Add a flag to disable "Live Traces" in an environment with many replicas. Sometime, "no feature" is better - or less deceptive - than a half-working feature in some environments,
- Add a warning in the "Live Traces" panel - saying that "you may only see a portion of Live Traces if you're running in a distributed environment". An explanation message is always educational and lower the frustration,
- Allow the addition of a custom link to an external OTEL/ LGTM / Grafana dashboard that will provide the end-user with full observability and comprehensive traces. Having such an observability dashboard is - IMHO and after all - a recommended best practice in environments with many replicas.
And of course, above options are not exclusive and we can decide to implement many of them! What do you think?
Thinking about it ... another way to work around this - if the Microcks instances are running on the developer's machines - is to have session stickiness enabled at the Kubernetes Service or Ingress/HTTPRoute level. This way, developers can be routed to the same Microcks pod as their app and view all their live traces.