support-core-plugin
support-core-plugin copied to clipboard
THREAD_DUMPS_ON_SLOW_REQUEST: adding the base of the development of t…
…his feature
Testing done
I tested this feature installing the snapshot version into a testing environment (in the last 2.452.2 LTS version) and run the following tests:
- The way to trigger the slow request was to run
sleep(60000)groovy code in theManage Jenkins -> Script Consolein multiples tabs at the same time (around 10). - I ran the above trigger a few times consecutively to confirm that we don't trigger the collecting operation while the
RECURRENCE_PERIOD_MINis working. - I run the above trigger each 12-15 minutes for more than 2 hours to confirm that the
SLOW_REQUEST_THREAD_DUMPS_TO_RETAINfunction works propertly. - I changed the parameters (
RECURRENCE_PERIOD_MIN,MINIMAL_SLOW_REQUEST_COUNT, etc) in the Jenkins startup to confirm that we can modify any of those parameters and the change affected the behavior of the new feature in the correct way. - I generated a new support bundle and confirmed that the new files were included correctly.
Proof of the feature working:
UI changes:
JIRA:
https://issues.jenkins.io/browse/JENKINS-73431
Submitter checklist
- [ ] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
- [x] Ensure that the pull request title represents the desired changelog entry
- [x] Please describe what you did
- [x] Link to relevant issues in GitHub or Jira
- [ ] Link to relevant pull requests, esp. upstream and downstream changes
- [x] Ensure you have provided tests - that demonstrates feature works or fixes the issue
The slow request records component already includes thread dumps, at least for the thread handling the HTTP request (and IIRC for any thread it is blocked on as well); what additional value comes from this? I did not follow the motivation in Jira.
Hi @jglick, thank you so much for your comment.
As you said, the SlowRequest indeed captures the request thread. However, under my experience, making a performance analysis could be very tricky if we only have that information (the threads of the requests that hit the Slow Request feature).
If the performance problem is locally located at the level of the feature that we are consuming with this request (for example if we are running a script in the Script Console that is consuming a lot of time), the thread of the request could be enough. However, when Jenkins faces performance degradation caused by reasons other than a specific request, the threads of the requests are not enough to perform a performance analysis. In that kind of case, complete thread dumps (and, furthermore, some of them taken periodically) could be very useful.
I know that we have the complete thread dump in the High CPU checker. However, sometimes (and this is very common indeed), the performance degradation is not link to a high CPU usage scenario and this collection is not triggered.
In this moment, you could ask why we would need to record this information if we can take the set of thread dumps manually during the slowness episode. The answer is simple: Jenkins is usually used to run periodic tasks and automatic tasks when nobody is on the site at the moment of the performance degradation. Furthermore, with the introduction of Jenkins in K8s and the fact that the readiness and liveness probes restart the pod under this kind of situation, it is very complex to perform a "post-morten" analysis once the pod is restarted.
Kind regards,
Just curious. Leaving review to @Dohbedoh.
Another thing that bugs me is that if we tweak the settings, we could theoretically have several SlowRequestThreadDumpsGenerator threads running (when iterations x frequency > recurrence period) and they would generate dumps. redundantly. And that make me feel like the generator should probably be a singleton that manage and launch the task.
First of all, thank @Dohbedoh so much for your review. Regarding this last question, I trust your criteria better than mine. However, I would like to talk about the matter briefly before making the change.
As you said, we could tune the variables to have two threads running simultaneously, and we should try to avoid it. However, I am afraid of making the class singleton. We are saving files during the thread execution and we are using the Java tool to do it. So, in case other threads affect the performance of those libraries (or it was blocking part of that code), would it not cause any kind of deadlock (or at least a performance degradation)?
Knowing that the only benefit of the singleton here is to avoid the execution of two threads due to an incongruent variables setup, it would not be better if we check the assigned values of the variables and establish minimum values.
Kind regards,
@ironcerocloudbees What I mean by singleton is a some kind of "Holder" for the generator Thread. To avoid creating new a SlowRequestThreadDumpsGenerator threads if one of them is actually still running. The condition whether we want to launch a generator thread (i.e. checkThreadDumpsTrigger) should not just be based on recurrence period but also on whether a dump generator is already running.
Hi @Dohbedoh,
Thank you again for your time here. I pushed a new version using a semaphore to avoid concurrencies. I have been reviewing different implementations of the Singleton for threads, and the semaphore looks a good option for us. If you could review it it would be great.
Kind regards,