argon2-jvm Potential performance improvements for findIterations()

The current findIterations method is too slow for my use-case. I intend to re-evaluate the number of iterations in a random interval to tailor the parameters to my hardware, and the current implementation is too inefficient and wasting a lot of resources when it is run multiple times during application runtime.

I have written a custom implementation that takes advantage of the fact that the hashing time grows roughly linear with the number of iterations.

Hashing time on my laptop for $$m = 8192$$ and $$p = 4$$ with incrementing number of iterations

My custom findIterations method

private int measure(long start) {
    final long end = System.nanoTime();
    return Long.valueOf((end - start) / 1_000_000).intValue();
}

protected int findIterations(Argon2Factory.Argon2Types type, int maxMillis, int m, int p) {
    final var argon = Argon2Factory.create(type);

    warmup(argon, "password".toCharArray());

    // first do single iteration and see where we're at
    int iterations = 1;

    long start = System.nanoTime();
    argon.hash(iterations, m, p, "password".toCharArray());
    int took = measure(start);

    // if one iteration already takes more than maxMillis, use one iteration
    if (took > maxMillis) return 1;

    // if one iteration takes less than a third of maxMillis, bump iterations to 3 to get more accurate results
    if (took < (maxMillis / 3)) iterations = 3;

    // do five rounds of hashing with those iterations and measure the execution time
    int[] measurements = new int[5];
    for (int i = 0; i < measurements.length; i++) {
        start = System.nanoTime();
        argon.hash(iterations, m, p, "password".toCharArray());
        // divide measurement by amount of iterations, to get the approximate time one iteration would take
        measurements[i] = measure(start) / iterations;
    }

    // get the average time it took for one iteration
    var avg = Arrays.stream(measurements).average().orElse(0.0);

    // return the approximated amount of iterations to stay within maxMillis, with a lower bound of 1 iteration
    return Math.max(1, Math.floorDiv(maxMillis, Double.valueOf(avg).intValue()));
}

My custom implementation vastly outperforms the library method, and the results of $$t$$ are within a very similar range. Especially for large numbers of iterations, the method as provided in the library will take a long time. See comparison below.

variant	max millis	memory	parallelism	result	hash time (5/avg)	$$t$$ found in
library method	1000	8192 kiB	1	$$t = 129$$	`848 ms`	`54997 ms`
my method	1000	8192 kiB	1	$$t = 166$$	`1005 ms`	`99 ms`

variant	max millis	memory	parallelism	result	hash time (5/avg)	$$t$$ found in
library method	1000	65536 kiB	1	$$t = 18$$	`967 ms`	`10751 ms`
my method	1000	65536 kiB	1	$$t = 18$$	`1028 ms`	`875 ms`

variant	max millis	memory	parallelism	result	hash time (5/avg)	$$t$$ found in
library method	1000	1048576 kiB	1	$$t = 1$$	`1135 ms`	`1131 ms`
my method	1000	1048576 kiB	1	$$t = 1$$	`1136 ms`	`1135 ms`

variant	max millis	memory	parallelism	result	hash time (5/avg)	$$t$$ found in
library method	1000	8192 kiB	4	$$t = 303$$	`939 ms`	`145418 ms`
my method	1000	8192 kiB	4	$$t = 333$$	`1002 ms`	`57 ms`

variant	max millis	memory	parallelism	result	hash time (5/avg)	$$t$$ found in
library method	1000	65536 kiB	4	$$t = 42$$	`920 ms`	`20730 ms`
my method	1000	65536 kiB	4	$$t = 47$$	`977 ms`	`351 ms`

variant	max millis	memory	parallelism	result	hash time (5/avg)	$$t$$ found in
library method	1000	1048576 kiB	4	$$t = 2$$	`872 ms`	`2623 ms`
my method	1000	1048576 kiB	4	$$t = 2$$	`791 ms`	`2411 ms`

[!NOTE] The second to last column shows the time it takes to hash with the evaluated $$t$$ parameter, it was measured by measuring 5 times and taking the average. The last column shows the amount of time it took to find the number of iterations.

Nov 11 '24 16:11 hertg

Hello,

thanks for the report! I agree, the current implementation is very naive. It's not intended to be run all the time, just one time on a new hardware, and then put the iteration count in a config file somewhere.

But looking at your results, I think we can use your algorithm, as I don't see any downsides. Would you mind creating a PR?

Nov 12 '24 13:11 phxql

Yeah, that was how I was running this until now, figuring out the iterations once and then keep them fixed. However, I found that in some real-world scenarios, this led to imperfect results, which is why I'm shifting to a more dynamic approach.

Sometimes the method was run when the system was otherwise under high load (cron jobs, backups, other applications, ...), which lead to weaker hashes. I'm now taking multiple samples at random times and taking a mean value
Sometimes the applications run for months at a time, the circumstances on the system may change during that lifetime (more applications get installed on the same hardware, competing for resources), in some cases this lead to bad UX because of long authentication times

Sure, my method from above has a few quirks, I can patch them up and provide a PR sometime :+1:

Nov 12 '24 16:11 hertg