java-specialagent
java-specialagent copied to clipboard
improve startup time
Several of our users have complained about the startup time of their services being multiplied by several factors when using the special agent. We even already use a trimmed-down version with most of the rule plugins (integrations) removed (sync'ed to v1.7.0 of the specialagent code) with an integrated Wavefront Tracer (using their SDK not their bundle; other tracers are removed from our build of specialagent). Even so, services sometimes see a 10x or greater increase to their start time (e.g. from 3 sec to 30).
On a dummy app that simply exits immediately, execution time without the agent was less than 100 ms. With the agent it is over 10 sec. Profiling/tracing the startup of the agent can be tricky (e.g. YourKit can't do this), so I simply took the poor-man's approach and immediately started a new thread that sleeps 13 ms and then prints a thread dump (infinite loop). This was run on a 4-core VM, and top showed basically 100% (1 full core) utilized for the entire 10 sec. I created a created a flame graph of the resulting thread dumps using https://github.com/brendangregg/FlameGraph.
The result gives a decent view of where the time is spent.
startup_13ms_samples_main_only.zip
(You'll need to download the file, unzip it, and view it in a browser since github doesn't support uploading SVG files directly). But here is a screenshot as well (after clicking into the main thread):

This shows that the majority of time is spent in bytebuddy code, which makes sense. So anything we can do to optimize byte buddy time will be good. But we can see that the majority of time is divided into a few main pieces:
BootLoaderAgent.premain (about 25%) AgentRule$$Access.init (about 40%) SpecialAgent.loadAdapter (about 20%, mostly loading the Wavefront SDK stuff)
Is there any potential to utilize multiple threads across any of the 3 above?
I am a ByteBuddy novice for sure, but I will do what I can to help with this effort.
It is likely that using a better profiler would help with the data fidelity as well.
Hi @randallt, thank you for this analysis! We have not applied a dedicated focus on profiling SpecialAgent, but its startup time does need some attention. The crux of performance hit is going to be the re/transformation operations that ByteBuddy -> ASM -> java.lang.instrument.Instrumentation has to perform to instrument the bytecode. The time that SpecialAgent needs to take to instrument the bytecode is linear with respect to the size of the bytecode (in bytes). If application (a) is 1.5x the size of application (b), then application (a) will take 1.5x the time to instrument compared to (b). There is no way around this, because ByteBuddy -> ASM -> java.lang.instrument.Instrumentation has to scan the entirety of the VM to potentially instrument matching sections.
One way to reduce the startup time for your own application is to trim down the codebase being packaged into the application. Spring and Spring Boot are notorious for importing everything (and the kitchen sink) into an application, regardless of whether it's just to print "Hello World".
There are a few things that could be done to improve performance:
-
Rule Chaining Another important aspect that results in longer startup times is the way rules are defined. Prior to SpecialAgent v1.7.0, all rules were independently applied to the application. This resulted in a re/transformation round for each rule. The work I did for SpecialAgent v1.7.0 was to chain as many rules into a single re/transformation round as possible. There were a few rules, however, that could not be chained. Reason being that these rules defined intercepts for the same class, but applied to different versions of the same library (Lettuce 5.0, 5.1, 5.2, and Spring WebMVC 3, 4, 5). Thanks to @malafeev's work, he was able to merge all Lettuce rules into in one (#544), and just recently all Spring WebMVC rules into one (#551 #554). The result of this work will be to be able to add the Lettuce and Spring WebMVC rules into the same re/transformation chain.
-
BootLoaderAgentandClassLoaderAgentAnother important aspect that results in longer startup times is theBootLoaderAgent. This agent is a core part of SpecialAgent's architecture, as it allows for plugin code to be injected into the bootstrap class loader for any version of the Java MV (1.7, 1.8, 9+). TheBootLoaderAgentintercepts very many calls, so a slight improvement in its runtime would result in a significant benefit.
Is there any potential to utilize multiple threads across any of the 3 above?
Unfortunately, delaying the initialization by utilizing threads would simply result in the application starting sooner, but the codebase being partially instrumented.
It is likely that using a better profiler would help with the data fidelity as well.
Yes, you're right! Thank you for the SVG -- it provides a good high-level breakdown of bottlenecks in SpecialAgent. I'll use it to guide my focus.
Hi @randallt, there have been a few updates to SpecialAgent concerning this issue, which include:
- Chain rules in
BootLoaderAgent - Removal of last "unchained" agent rule: Spring Web
- Merge unchained
spring-webmvc-*rules into a single chained rule - Merge unchained
lettuce-*rules into a single chained rule
These changes address the relatively "low hanging fruit" items to improve startup time. There are a number of more involved items that would also help, which are:
- Optimize bytecode injection mechanism in
ClassLoaderAgent. - Optimize each individual ByteBuddy transformation in each
AgentRulesubclass.
We will release SpecialAgent v1.7.2 this weekend, which will include the former updates. If you have any feedback regarding your own observations, this would be welcome!
Hi @safris, only (1) above applies to my build, as I don't include the spring or lettuce rules at all. That said, this change has had a great effect. For a simple service with the tracing agent, startup time went from 20s to 13s seconds (a 35% reduction), with the premain execution time going from 12s down to 7s (42% reduction).
Great work, and I hope there's more where that came from!
On a larger application, this change dropped the startup time about 20 sec, or 29%, from about 70 sec down to 50 sec. I haven't seen any change in function.
I see this issue is currently in queue for milestone v1.7.4. Curious to know if a fix is still forthcoming?
Seeing long startup times (approx 5 minutes) with tomcat:7.0.103-jdk8 environments.
Some more context on the startup environment:
export LS_AGENT_FILE="${LS_AGENT_FILE:-opentracing-specialagent-1.7.3.jar}"
export LIGHTSTEP_VARS="-Dsa.exporter=lightstep -Dls.collectorHost=$LS_HOST -Dls.collectorProtocol=$LS_PROTOCOL -Dls.collectorPort=$LS_PORT -Dls.componentName=$LS_SERVICE -Dls.tags='environment:${LS_ENVIRONMENT},version:${LS_SERVICE_VERSION},hostname:${LS_HOSTNAME}' -Dls.accessToken=$LS_TOKEN -Dls.propagator=b3"
export CATALINA_OPTS="$CATALINA_OPTS $LIGHTSTEP_VARS -javaagent:$(pwd)/${LS_AGENT_FILE}"