support-core-plugin icon indicating copy to clipboard operation
support-core-plugin copied to clipboard

THREAD_DUMPS_ON_SLOW_REQUEST: adding the base of the development of t…

Open ironcerocloudbees opened this issue 1 year ago • 6 comments
trafficstars

…his feature

Testing done

I tested this feature installing the snapshot version into a testing environment (in the last 2.452.2 LTS version) and run the following tests:

  1. The way to trigger the slow request was to run sleep(60000) groovy code in the Manage Jenkins -> Script Console in multiples tabs at the same time (around 10).
  2. I ran the above trigger a few times consecutively to confirm that we don't trigger the collecting operation while the RECURRENCE_PERIOD_MIN is working.
  3. I run the above trigger each 12-15 minutes for more than 2 hours to confirm that the SLOW_REQUEST_THREAD_DUMPS_TO_RETAIN function works propertly.
  4. I changed the parameters (RECURRENCE_PERIOD_MIN, MINIMAL_SLOW_REQUEST_COUNT, etc) in the Jenkins startup to confirm that we can modify any of those parameters and the change affected the behavior of the new feature in the correct way.
  5. I generated a new support bundle and confirmed that the new files were included correctly.

Proof of the feature working:

image

UI changes:

image

JIRA:

https://issues.jenkins.io/browse/JENKINS-73431

Submitter checklist

  • [ ] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • [x] Ensure that the pull request title represents the desired changelog entry
  • [x] Please describe what you did
  • [x] Link to relevant issues in GitHub or Jira
  • [ ] Link to relevant pull requests, esp. upstream and downstream changes
  • [x] Ensure you have provided tests - that demonstrates feature works or fixes the issue

ironcerocloudbees avatar Jul 10 '24 10:07 ironcerocloudbees

The slow request records component already includes thread dumps, at least for the thread handling the HTTP request (and IIRC for any thread it is blocked on as well); what additional value comes from this? I did not follow the motivation in Jira.

jglick avatar Jul 10 '24 18:07 jglick

Hi @jglick, thank you so much for your comment.

As you said, the SlowRequest indeed captures the request thread. However, under my experience, making a performance analysis could be very tricky if we only have that information (the threads of the requests that hit the Slow Request feature).

If the performance problem is locally located at the level of the feature that we are consuming with this request (for example if we are running a script in the Script Console that is consuming a lot of time), the thread of the request could be enough. However, when Jenkins faces performance degradation caused by reasons other than a specific request, the threads of the requests are not enough to perform a performance analysis. In that kind of case, complete thread dumps (and, furthermore, some of them taken periodically) could be very useful.

I know that we have the complete thread dump in the High CPU checker. However, sometimes (and this is very common indeed), the performance degradation is not link to a high CPU usage scenario and this collection is not triggered.

In this moment, you could ask why we would need to record this information if we can take the set of thread dumps manually during the slowness episode. The answer is simple: Jenkins is usually used to run periodic tasks and automatic tasks when nobody is on the site at the moment of the performance degradation. Furthermore, with the introduction of Jenkins in K8s and the fact that the readiness and liveness probes restart the pod under this kind of situation, it is very complex to perform a "post-morten" analysis once the pod is restarted.

Kind regards,

ironcerocloudbees avatar Jul 10 '24 18:07 ironcerocloudbees

Just curious. Leaving review to @Dohbedoh.

jglick avatar Jul 10 '24 21:07 jglick

Another thing that bugs me is that if we tweak the settings, we could theoretically have several SlowRequestThreadDumpsGenerator threads running (when iterations x frequency > recurrence period) and they would generate dumps. redundantly. And that make me feel like the generator should probably be a singleton that manage and launch the task.

First of all, thank @Dohbedoh so much for your review. Regarding this last question, I trust your criteria better than mine. However, I would like to talk about the matter briefly before making the change.

As you said, we could tune the variables to have two threads running simultaneously, and we should try to avoid it. However, I am afraid of making the class singleton. We are saving files during the thread execution and we are using the Java tool to do it. So, in case other threads affect the performance of those libraries (or it was blocking part of that code), would it not cause any kind of deadlock (or at least a performance degradation)?

Knowing that the only benefit of the singleton here is to avoid the execution of two threads due to an incongruent variables setup, it would not be better if we check the assigned values of the variables and establish minimum values.

Kind regards,

ironcerocloudbees avatar Jul 23 '24 06:07 ironcerocloudbees

@ironcerocloudbees What I mean by singleton is a some kind of "Holder" for the generator Thread. To avoid creating new a SlowRequestThreadDumpsGenerator threads if one of them is actually still running. The condition whether we want to launch a generator thread (i.e. checkThreadDumpsTrigger) should not just be based on recurrence period but also on whether a dump generator is already running.

Dohbedoh avatar Jul 29 '24 06:07 Dohbedoh

Hi @Dohbedoh,

Thank you again for your time here. I pushed a new version using a semaphore to avoid concurrencies. I have been reviewing different implementations of the Singleton for threads, and the semaphore looks a good option for us. If you could review it it would be great.

Kind regards,

ironcerocloudbees avatar Aug 02 '24 13:08 ironcerocloudbees