openvsx icon indicating copy to clipboard operation
openvsx copied to clipboard

Begin tracing

Open MatthewKhouzam opened this issue 1 year ago • 17 comments

Introduce a tracer to Open-VSX.

This tracer so far instruments the Query, QueryV2 and Search endpoints. It is used to see what kind of load is being placed on open vsx.

The output is in the format (https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview) and can be viewed using Chromium (https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/) or Eclipse Trace Compass (https://www.youtube.com/watch?v=YCdzmcpOrK4).

This patch aims to improve the observability of Open-VSX.

MatthewKhouzam avatar Dec 15 '23 04:12 MatthewKhouzam

fyi: @planger

vince-fugnitto avatar Dec 19 '23 15:12 vince-fugnitto

Minor update.

MatthewKhouzam avatar Jan 15 '24 19:01 MatthewKhouzam

This is a decent sized update, it fixes most things except for the code style. This I would leave up to the actual maintainers of the code.

I see this patch may improve making data accessible in a GDPR friendly anonymized way. Right now we only record API calls, not who is calling or what is returned. This is anonymous and can be "public".

MatthewKhouzam avatar Jan 16 '24 16:01 MatthewKhouzam

Thanks for the contribution @MatthewKhouzam. We use Spring Boot Actuator to manage and monitor the server. The data can be collected by Prometheus and displayed in Grafana, as outlined in this proof of concept: https://github.com/eclipse/openvsx/pull/667

What data are you looking to collect?

amvanbaren avatar Jan 23 '24 12:01 amvanbaren

The data would be internal. At the moment, you can see that the api is logged, but also some inner for loops. They would be corelated so you can see which query is slow and more importantly why it is slow.

To be clear, Spring Boot Actuator is good at monitoring, this is for investigating.

MatthewKhouzam avatar Jan 25 '24 19:01 MatthewKhouzam

There already is LongRunningRequestFilter for investigating.

amvanbaren avatar Jan 26 '24 11:01 amvanbaren

Either way, I would update this patch to use https://github.com/eclipse-tracecompass/trace-event-logger. Regarding LongRunningRequestFilter, as far as I can tell, it logs the request, not what happens inside the request. This would correlate a request to a flame chart, and flame graph.

https://www.youtube.com/watch?v=ZlAZy7GjXRE&pp=ygUWb2JzZXJ2YWJpbGl0eSB0aGVpYWNvbg%3D%3D

Here is an example of what we can show with this patch

https://github.com/eclipse/openvsx/assets/419535/b43490cc-e587-4289-96af-1ee1253f03da

MatthewKhouzam avatar Jan 26 '24 15:01 MatthewKhouzam

The reasoning of this patch is to not show when the code is slow. Which is handled well with Spring boot actuator, but rather what is making it slow. For that you either need deeper instrumentation. This is why chrome://tracing exists alongside, Otel. Both tools have their purpose. This patch allows chome tracing, perfetto or tracecompass to analyze the issue.

MatthewKhouzam avatar Jan 26 '24 15:01 MatthewKhouzam

Here is an example of what we can show with this patch

Interesting, this from your own or Ericsson's Open VSX instance?

For that you either need deeper instrumentation.

Right, like https://www.baeldung.com/spring-boot-opentelemetry-setup or https://grafana.com/blog/2022/05/04/how-to-capture-spring-boot-metrics-with-the-opentelemetry-java-instrumentation-agent/. I don't think handcoding the observability instrumentation is the way to go.

amvanbaren avatar Jan 26 '24 17:01 amvanbaren

Here is an example of what we can show with this patch

Interesting, this from your own or Ericsson's Open VSX instance?

It is from an instance of Trace Compass traced, as well as an example from an OS level trace. VSX was not shown as the data is not super interesting with the gatling traces. It was just a blob. :(

For that you either need deeper instrumentation.

Right, like https://www.baeldung.com/spring-boot-opentelemetry-setup

This is the problem

optel

or https://grafana.com/blog/2022/05/04/how-to-capture-spring-boot-metrics-with-the-opentelemetry-java-instrumentation-agent/ .

This is even less interesting as we are dealing with KPIs, which are inferred and aggregated. A lot of information is lost.

I don't think handcoding the observability instrumentation is the way to go.

That is a fair point, however, the tracer will be abstracted out soon. I have a project in eclipse for that. :) (Please thank IP team for their FAST response)

For better observability within VSX, you need to add spans using open telemetry, or something similar in this library.

Here is an example of what the end goal would look like.

image

All manual work is there to add precision beyond what an auto-instrumenting program can supply. Think of it like JVMTI profiler, if it instruments everything, your program will collapse under the weight of the probes. OTel just does the endpoints. We are going deeper, but you need to define how deep.

Here are some references that can help also, they are a bit deep though: https://www.dorsal.polymtl.ca/files/may2022/mekhlasi_may2022_performance_issues_detection.pdf https://amdls.dorsal.polymtl.ca/files/2019-05-06%20Progress%20Report%20Meeting.pdf https://www.dorsal.polymtl.ca/files/december2022/asangam_december2022_tracing_tools_low_latency.pdf

MatthewKhouzam avatar Jan 26 '24 18:01 MatthewKhouzam

For better observability within VSX, you need to add spans using open telemetry

Spring supports that: "We want to have some detailed observation of the MyUserService#userName method. Thanks to having added AOP support, we can use the @Observed annotation. To do so, we can register a ObservedAspect bean." source: https://spring.io/blog/2022/10/12/observability-with-spring-boot-3#webmvc-server-code

amvanbaren avatar Jan 29 '24 19:01 amvanbaren

Yes, on a per function level, not on a per-scope level.

A way to side step that would be to keep all functions simple though. :)

MatthewKhouzam avatar Jan 29 '24 19:01 MatthewKhouzam

@MatthewKhouzam, I still think that for an online service the tracing should be online, also, delivering the actual information at any moment of time. TraceCompass is an amazing tool, but I would prefer to have something OpenTelemetry-based, such as Jaeger. TraceCompass could then be used by the developers to find hotspots in the code while working on the code locally. But even then I would prefer some profile-style tool which doesn't need changes in the code, but uses some IDE integration instead. What do you think about it.

kkistm avatar Feb 02 '24 15:02 kkistm

I am OK with anything, as long as there's something. I was proposing the chromium/trace compass solution since it is something we used internally, and there are many features not available yet in jaeger. (Flamegraph, lookup stats, kernel correlation) but really... we just need something. :)

Regarding tracing seeing info at any moment, that's more monitoring, super important too. Post mortems and deep dives are where this patch was focusing.

MatthewKhouzam avatar Feb 02 '24 16:02 MatthewKhouzam

I'm in the process of setting up monitoring with Grafana Cloud. Once monitoring works I want to add tracing/observability using OpenTelemetry. Grafana offers flame graphs, but I don't know if it "just works" or needs extra configuration.

amvanbaren avatar Feb 05 '24 10:02 amvanbaren

Yes, on a per function level, not on a per-scope level.

You can also do it manually:

@Service
public class MyObservableService {

  private final ObservationRegistry observationRegistry;

  public MyObservableService(ObservationRegistry observationRegistry) {
    this.observationRegistry = observationRegistry;
  }

  public void tracedMethod() {
    Observation.createNotStarted("tracedMethod", this.observationRegistry)
      .lowCardinalityKeyValue("locale", "en-US")
      .highCardinalityKeyValue("userId", "77")
      .observe(() -> {
        // Write business logic here, all this code will now be traced/observed by this newly created "Observation" 
        // using the observationRegistry created earlier.
      });
  }
}

source: https://www.unlogged.io/post/enhanced-observability-with-java-21-and-spring-3-2 This is a good overview post: https://www.baeldung.com/spring-boot-3-observability

amvanbaren avatar Feb 05 '24 10:02 amvanbaren

I think #857 will do part of what this patch does.

If you want any feature of https://github.com/eclipse-tracecompass/trace-event-logger, you know where to find it.

MatthewKhouzam avatar Feb 15 '24 16:02 MatthewKhouzam