ultimate icon indicating copy to clipboard operation
ultimate copied to clipboard

Performance Improvements around hashCode and equals

Open maul-esel opened this issue 2 years ago • 8 comments

Consistently use reference equality for transitions, events and conditions:

  • for performance reasons -- comparing sets is expensive;
  • and because in practice two different instances representing "equal" transitions / events / conditions should never be created.

Also, do not use consecutive numbers (e.g. serialNumber) as hash codes; this can cause frequent hash collisions for sets, maps or lists of transitions / events / conditions.

maul-esel avatar Aug 26 '22 15:08 maul-esel

For the comparison I used by laptop, the materialistic benchmark set, and only differencePairwise.epf

First of all I did three runs on the same commit to demonstrate the usual fluctuation between different benchmark runs.

1st run on dev aa1133877f641a781b40cccc91a0d94af80389c9 Runtime 488s CsvAutomataOperationStatistics9983638_2022-08-26_17-40-29-914.csv

2nd run on dev aa1133877f641a781b40cccc91a0d94af80389c9 Runtime 470s CsvAutomataOperationStatistics1540219992_2022-08-26_17-51-35-185.csv

3rd run on dev aa1133877f641a781b40cccc91a0d94af80389c9 Runtime 462s CsvAutomataOperationStatistics1540219992_2022-08-26_18-04-16-041.csv

Run on 1st commit 273a740b18e21df64684e64ccbd3850467443978 Runtime 336s CsvAutomataOperationStatistics1540219992_2022-08-26_18-16-26-112.csv

Run on 2nd commit 1d81b42cc5a43d74dc5159719fd866fc795d1bb3 Runtime 339s CsvAutomataOperationStatistics1540219992_2022-08-26_18-37-56-775.csv

Run on 3rd commit fa7f963c4f30bd571ddec19f9c087b0c5ed1628c Runtime 343s CsvAutomataOperationStatistics2008907751_2022-08-26_19-36-31-697.csv

Run on 4th commit 0701f203130b18eb03d65f75545292fcbdb27292 Runtime 392s (maybe already slower) CsvAutomataOperationStatistics559559860_2022-08-26_19-52-19-260.csv

Run on 5th commit 138d689f235eb087f136badc52e3485edd6c3b58 Runtime 4045s (!!! no typo) CsvAutomataOperationStatistics1540219992_2022-08-26_21-05-25-416.csv

Run on 6th commit 95b21d5a1843a0e80fed0d417808493c3a870515 Runtime 4249s (!!! no typo) CsvAutomataOperationStatistics2008907751_2022-08-26_21-23-45-603.csv

Heizmann avatar Aug 27 '22 13:08 Heizmann

For the comparison I used by laptop, the materialistic benchmark set, and only differencePairwise.epf

Runtime on 3e9ae16 471s (before transition hashcode bug) CsvAutomataOperationStatistics9983638_2022-08-29_21-56-18-250.csv

Runtime on 8e5ba81897 481s (fixed transition hashcode bug) CsvAutomataOperationStatistics1540219992_2022-08-29_22-12-48-267.csv

Runtime on 0034fd1df9 331s (reference equality Event) CsvAutomataOperationStatistics2008907751_2022-08-29_22-21-58-397.csv

Runtime on a69b4bd23d 333s CsvAutomataOperationStatistics1540219992_2022-08-29_22-49-16-569.csv

Runtime on 0612a5f3b2 332s CsvAutomataOperationStatistics1540219992_2022-08-29_23-03-19-951.csv

Runtime on 1a063de8a0 390s (HashJenkins for Conditions) CsvAutomataOperationStatistics903064416_2022-08-29_23-17-32-749.csv

Runtime on 01280f3148 389s CsvAutomataOperationStatistics559559860_2022-08-29_23-28-39-629.csv

Runtime on c3615b72dd 465s CsvAutomataOperationStatistics1540219992_2022-08-29_23-48-56-993.csv

Runtime on c5bc13a0fd 457s CsvAutomataOperationStatistics1540219992_2022-08-30_00-10-39-906.csv

Runtime on 3902331 446s CsvAutomataOperationStatistics1540219992_2022-08-30_01-42-59-379.csv

I am puzzled and I do not yet have an explanation.

Heizmann avatar Aug 29 '22 23:08 Heizmann

Further analysis part 1: ordinary hashing vs. hashJenkins If I change die computation of hashcodes in 39023316d06d0419909ad6061035b121ff7c46d3 for Transition, Event, and Condition from hashJenkins to a multiplication by 17 the runtime is 463s. (roughly similar to hashJenkins)

Heizmann avatar Sep 02 '22 11:09 Heizmann

Further analysis part 2: Are there hash collisions? I wanted to find out if using serial numbers resp. IDs avoids hash collisions in HashSets and HashMaps. I don't really know how I can do this. Newer Eclipse versions changed the presentation of variable values for Sets/Maps. In the past one could navigate through the original data structures (usually not userfriendly for debugging) now one can only see a list of all elements (usually great but disastrous for my purpose). I randomly watched some lager sets and maps while debugging. Elements are not always perfectly ordered. But sometimes the objects with hash k are at position k which, I guess, hints that there a fewer hash collisions. It's probably confirm this assumption by some experiment.

Heizmann avatar Sep 02 '22 12:09 Heizmann

Further analysis part 3: Are hash collisions costly? I implemented the following loop. If DO_HASH is set the runtime is 209s, otherwise the runtime is 54s.

	private static final boolean DO_HASH = false;

	public static void main(final String[] args) {

		for (int i=0; i<1_000; i++) {
			final HashSet<Integer> theSet = new HashSet<>();
			for (int j=0; j<1_000_000; j++) {
				if (DO_HASH) {
					theSet.add(HashUtils.hashJenkins(17, j));
				} else {
					theSet.add(j);
				}
			}
		}
		System.out.println("Done.");
	}

I also redid the experiment with LinkedHashSets. If DO_HASH is set the runtime is 276s, otherwise the runtime is 86s.

Heizmann avatar Sep 02 '22 12:09 Heizmann

Further analysis part 4: Does the hash code has an effect on the performance in verification?

I run the Svcomp20AutomizerConcurrentSpeedBenchmarks and compared 8e5ba81 , 0612a5f , and 3902331. In the following file I interleaved the lines of the IncrementalLogWithBenchmarkResults. interleaved.txt

One cannot see a difference in the runtime.

The benchmark is however only helpful for finding substantial differences in the runtime for the following reasons.

  • Slowest solved benchmark takes 16s
  • Largest Petri net in solved benchmark has only around 3500 flow.
  • Different numbers of iterations hint that different counterexamples were taken.

Heizmann avatar Sep 03 '22 10:09 Heizmann

I ran the SV-COMP Concurrency benchmarks on commits 0612a5f3 (before hashJenkins) and 39023316 (after hashJenkins). The results can be found here and do show as significant a difference.

maul-esel avatar Sep 21 '22 12:09 maul-esel

The changes between these commits do not simply introduce hashJenkins, they also change some things slightly (somewhat bigger Condition, hashJenkins that does not consider other fields anymore in Transition).

This benchmark seems to be most affected. image

danieldietsch avatar Sep 21 '22 14:09 danieldietsch

OK, I've investigated one of the benchmarks further (weaver/parallel-min-max-1). The hash code changes seem to affect both the number of iterations and the time spent per iteration. I tested all combinations of "serial number" hashcodes (s) and hashJenkins (j) for transitions, conditions and events:

transition / condition / event / result
-------------------------------------------
s / s / s   11 iterations, 29s
s / s / j   11 iterations, 31s
s / j / s   >= 16 iterations, >7min
s / j / j   >= 15 iterations, >4min
j / s / s    9 iterations, 15s
j / s / j    9 iterations, 26s
j / j / s   11 iterations, 33s
j / j / j    9 iterations, 85s

maul-esel avatar Nov 04 '22 11:11 maul-esel

The different numbers of iterations seem to suggest that some code, perhaps in the unfolding & selection of counterexamples, is iterating over a HashSet of events or conditions, perhaps? I'll debug further after lunch.

maul-esel avatar Nov 04 '22 11:11 maul-esel

OK, one problem is that PetriNetUnfolder::constructRun(Event, ConditionMarking) iterates over event.getPredecessorConditions(), which is a HashSet and thus the iteration order depends on the hash codes. As a result, different counterexamples are returned, changing the refinement iterations.

Though that is not the only problem: Even with a fix, I get different counterexamples in later iterations. In this case, the events leading to an accepting condition already have different serial numbers. I did not debug further to find out why.

For now, I'll merge this PR as-is. I'll leave it to you, @Heizmann, if you want to follow up on these issues in the future.

maul-esel avatar Nov 04 '22 15:11 maul-esel