jmc icon indicating copy to clipboard operation
jmc copied to clipboard

7879: Automated Analysis taking very long time to produce results for Class Leak Rule and showing wrong results.

Open Suchitainf opened this issue 3 years ago • 11 comments

This PR takes care of optimizing and correcting the Class leak rule results on Automated Analysis Page.


Progress

  • [x] Commit message must refer to an issue
  • [ ] Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JMC-7879: Automated Analysis taking very long time to produce results for Class Leak Rule and showing wrong results.

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jmc pull/419/head:pull/419
$ git checkout pull/419

Update a local copy of the PR:
$ git checkout pull/419
$ git pull https://git.openjdk.org/jmc pull/419/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 419

View PR using the GUI difftool:
$ git pr show -t 419

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jmc/pull/419.diff

Suchitainf avatar Aug 20 '22 18:08 Suchitainf

:wave: Welcome back schaturvedi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Aug 20 '22 18:08 bridgekeeper[bot]

Is https://bugs.openjdk.org/browse/JMC-7879 a duplicate of https://bugs.openjdk.org/browse/JMC-7248?

It looks like this PR is #248 with some of the review comments addressed, should the former PR be closed in favor of this?

There are a couple of review comments that didn't receive any changes in this PR: https://github.com/openjdk/jmc/pull/248/files#r756349943 https://github.com/openjdk/jmc/pull/248/files#r756353644

It looks like the consensus on #248 was that it was good enough to go in, maybe ping @bric3 and/or @Gunde for a re-review/approval?

This PR is a superset of the previous one. The older one deals with only optimization of the class leak rule but this one deals with the optimization as well as the correction of the results shown on automated analysis page for class leak rule. I have implemented all the review comments given on the previous PR. With respect to the two comments mentioned above, the configured timeout is being fetched from preference hence I am finding it difficult to make it final. The interrupted exception handling is done where the method is called in catch block. I had already pinged about this issue in general slack channel and it was discussed in hangout also.

@Gunde and @bric3 Request both of you to please take a look at this PR and let me know your review comments. Thanks in advance.

Suchitainf avatar Aug 29 '22 22:08 Suchitainf

@Suchitainf Sorry for the lack of response, I was on vacation in August, and got some stuff at work to handle, I should have more luck next week. Thanks for your patience !

bric3 avatar Sep 22 '22 15:09 bric3

Hi @Suchitainf! When will you be able to fix the remaining issues?

thegreystone avatar Oct 04 '22 14:10 thegreystone

@Gunde @bric3 I have tried to implement all the review comments. Please re-review the PR and let me know if there is something which is still need to be corrected.

Suchitainf avatar Oct 05 '22 18:10 Suchitainf

@Suchitainf There's still these two comment to address in my opinion:

  • https://github.com/openjdk/jmc/pull/419#discussion_r987769827
  • https://github.com/openjdk/jmc/pull/419#discussion_r988313757

Note I'm really unsure about the result ordering there, may be @thegreystone or @Gunde may chime in there to make sure this change is ok.

bric3 avatar Oct 05 '22 20:10 bric3

@bric3 We need to show the names of top 5 classes which were loaded maximum times without being unloaded. So the ordering matters here. That's why I have skipped that comment.

Suchitainf avatar Oct 06 '22 04:10 Suchitainf

@bric3 We need to show the names of top 5 classes which were loaded maximum times without being unloaded. So the ordering matters here. That's why I have skipped that comment.

I'm a bit confused, as in DefaultIItemResultSet the insertion in data order is dependent on the iteration order of the aggregate collection, since it's a set, there's no ordering guarantees. Although I lack knowledge of the actual type and the implementation is a TreeSet or something alike.

https://github.com/openjdk/jmc/blob/1696e03a1fddb9cc68d32bf21a281ef24366e552/core/org.openjdk.jmc.flightrecorder.rules.jdk/src/main/java/org/openjdk/jmc/flightrecorder/rules/jdk/util/DefaultIItemResultSet.java#L95-L99

Moreover the usage of the thread pool here will not guarantee the add operation will happen in the same order as the tasks have been submitted. Which leads me to think that the code should be reworked here to ensure that the results will be inserted in order.

I have used a synchronized block for data. After that I am getting consistent results. Without that I was getting different results every time. I mean when I was working for test JFR wldf.jfr without synchronized block on data, the top 5 results varied every time.

Suchitainf avatar Oct 06 '22 19:10 Suchitainf

I have used a synchronized block for data. After that I am getting consistent results. Without that I was getting different results every time. I mean when I was working for test JFR wldf.jfr without synchronized block on data, the top 5 results varied every time.

Have you tried with an intermediate ConcurrentLinkedQueue, then copy the results in the data array list ?

bric3 avatar Oct 07 '22 07:10 bric3

@bric3 I have tried two intermediate data structures but both implementations are giving wrong results.

Approach 1: private final List<Object[]> processingList = Collections.synchronizedList(new ArrayList<Object[]>()); //Creation of intermediate data structure List processingList.add(row); //Adding row without synchronized block OR synchronized (processingList) { //Adding row with synchronized block processingList.add(row); } data.addAll(processingList); // Finally adding the processed list to the Arraylist.

Approach 2: private final ConcurrentLinkedQueue<Object[]> processingQueue = new ConcurrentLinkedQueue<Object[]>(); //Creation of intermediate data structure ConcurrentLinkedQueue processingQueue (row); //Adding row without synchronized block OR synchronized processingQueue { //Adding row with synchronized block processingQueue (row); } data.addAll(processingQueue); // Finally adding the processed Queue to the Arraylist.

To be more precise, by wrong results, I mean the size of data structure (I am trying printing while debugging) and the final top 5 results are not consistent.

Suchitainf avatar Oct 10 '22 12:10 Suchitainf

This PR is with respect to an issue raised by one of our customers where Class Leak Rule is taking 60-90 minutes to evaluate the results. The issue was raised on Sat Apr 24 2021. Its been more than a year and we couldn't give any solution to our customers, they are still living with that unacceptable calculation time. Now they are chasing us with several messages and mails to understand when will they receive a fix for it. They have marked it as P2.

Hence, considering below points:

  1. We are reducing the rule calculation time from 60-90 mins to less than 60 seconds. These calculation timings are with respect to the JFR shared by customer while raising the issue in BugDB (Oracle Internal portal for raising issues).
  2. We are correcting the results shown for Class Leak Rule and the correction is a drastic change. The existing approach is showing everything in GREEN even if there is a big issue and HIGH ALERT should be given as part of this rule results. This PR is fixing that issue and showing the correct result to the customers.
  3. We have a fix which is not breaking anything as such. No Critical or Must DO review comment is left out unattended.

Is it possible to accept this PR for 8.3 release and create a new lesser priority enhancement request if there are any modifications required with respect to re-designing of this Rule or using some other data structure?

@thegreystone @Gunde @aptmac @bric3 I would like to know your views on this. Thanks in advance.

Suchitainf avatar Oct 20 '22 19:10 Suchitainf

@aptmac, @Gunde & @bric3 - can you please help Suchita complete this PR?

thegreystone avatar Oct 21 '22 22:10 thegreystone

@aptmac @bric3 I have implemented the review comments suggested.

@bric3 Yes, you are right " In DefaultIItemResultSet the insertion in data order is dependent on the iteration order of the aggregate collection, since it's a set, there's no ordering guarantees. " We are just processing the records in DefaultIItemResultSet. Sorting is taken care in ClassLeakingRule.java class at line no 182 ( entries.sort(null); ) after all the records are processed.

The reason for why wrong records were shown previously was : We were showing the results in ascending order of the count i.e. difference between the number of times classed loaded vs unloaded. Whereas we had to show in descending order. For Ex: There are 10 classes A (0), B (1), C (1)........H (55), I (60), J (75). As per current implementation we are showing A, B, C on class leak rule results on Automated Analysis Screen and saying everything is normal. Whereas, we have to show J, I, H classes because they are loaded without being unloaded and they need attention. So JMC has to show warning instead of showing OK results.

Regarding test case, we already have a JFRRuleBaseline.xml for testing rules and their results and I have updated that as part of this PR.

Suchitainf avatar Nov 11 '22 12:11 Suchitainf

Hi @bric3 @Gunde @aptmac , Did you get a chance to check the latest patch? Please let me know what else is required, so that we can close this PR.

Suchitainf avatar Dec 16 '22 07:12 Suchitainf

In the latest patch, I have removed the synchronized block which was the main bottle neck for the approval of this PR. Also, used the suggested data structure i.e. ConcurrentLinkedQueue for the processing of records.

I have debugged in detail and figured out that the reason for inconsistent results for the customer JFR which has around 65000 class entries to process was the time multithreaded tasks were taking to complete which was dependent on many factors like how many cores our CPU has OR how busy our CPU is when its performing those tasks. Hence, for safer side I have added a check to see whether all tasks are performed or not. If not, I have added a console message to alert the user for increasing the configured timeout. The default timeout is 5 min which is quite sufficient to complete all the tasks still if the count of entries are too much and there are some pending tasks, user will be alerted for the same.

@bric3 Please let me know when can we have a zoom (if required).

Suchitainf avatar Jan 08 '23 19:01 Suchitainf

Hi @Suchitainf I'll have a look tommorrow.

bric3 avatar Jan 17 '23 15:01 bric3

@Suchitainf This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

7879: Automated Analysis taking very long time to produce results for Class Leak Rule and showing wrong results.

Reviewed-by: hdafgard, bdutheil

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Jan 19 '23 12:01 openjdk[bot]

/integrate

Suchitainf avatar Jan 19 '23 16:01 Suchitainf

Going to push as commit 2b1b5972738b97ba60f0b03960fda6d1994ce1dc.

openjdk[bot] avatar Jan 19 '23 16:01 openjdk[bot]

@Suchitainf Pushed as commit 2b1b5972738b97ba60f0b03960fda6d1994ce1dc.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Jan 19 '23 16:01 openjdk[bot]