john icon indicating copy to clipboard operation
john copied to clipboard

Misreported c/s and C/s at high speeds

Open solardiz opened this issue 4 years ago • 12 comments

It looks like running nt-opencl with mask on Vega 64 overflows some math leading to incorrect c/s and C/s figures:

$ ./john -form=nt-opencl -mask='?a' -min-len=6 -max-len=8 -dev=1,4,5 -fork=3 pw
Using default input encoding: UTF-8
Loaded 10000 password hashes with no different salts (NT-opencl [MD4 OpenCL])
Remaining 159 password hashes with no different salts
Node numbers 1-3 of 3 (fork)
Device 1: gfx900 [Radeon RX Vega]
Device 4: GeForce GTX 1080
Device 5: GeForce GTX TITAN X
Build log: /tmp/OCL19185T1.cl:332:18: warning: unknown attribute 'max_constant_size' ignored
                __attribute__((max_constant_size (NUM_INT_KEYS * 4)))
                               ^
1 warning generated.

Press 'q' or Ctrl-C to abort, almost any other key for status
Build log: /tmp/OCL19185T2.cl:332:18: warning: unknown attribute 'max_constant_size' ignored
                __attribute__((max_constant_size (NUM_INT_KEYS * 4)))
                               ^
1 warning generated.

1 0g 0:00:00:22 1.95% (7) (ETA: 23:08:58) 0g/s 20816Mp/s 2074Mc/s 838488TC/s aa`8Bga..aa\7mpa
2 0g 0:00:00:23 1.04% (7) (ETA: 23:26:57) 0g/s 10653Mp/s 10653Mc/s 1693GC/s Dev#4:56C aafVfVf..aa][zVf
3 0g 0:00:00:25 1.04% (7) (ETA: 23:30:09) 0g/s 9801Mp/s 9801Mc/s 1558GC/s Dev#5:67C [email protected]
3 0g 0:00:02:37 7.21% (7) (ETA: 23:26:26) 0g/s 10804Mp/s 10804Mc/s 1717GC/s Dev#5:83C aavK4jW..aaav6jW
2 0g 0:00:02:37 9.44% (7) (ETA: 23:17:52) 0g/s 14141Mp/s 14141Mc/s 2248GC/s Dev#4:91C aa>"PzA..aaY9JzA
1 0g 0:00:02:37 16.43% (7) (ETA: 23:06:04) 0g/s 24616Mp/s 2293Mc/s 117495TC/s aa2iQ:o..aa:tL[o

I think the p/s figures are correct, and all are correct for the NVIDIA devices, but the c/s and C/s for the Vega are way off.

solardiz avatar Nov 21 '20 21:11 solardiz

We might find a way to bodge-fix this but I think the canonical solution is to start using uint64_t where needed, instead of using int and later treating it as unsigned int - which might not always be enough and I reckon that's the case here.

Edit: Given crypt_all() can return -1 (or did I dream that?) we might need int64_t

magnumripper avatar Nov 28 '20 00:11 magnumripper

We have many instances of these in mask-aware formats:

#if SIZEOF_SIZE_T > 4
        /* We can't process more than 4G keys per crypt() */
        while (gws_limit * mask_int_cand.num_int_cand > 0xffffffffUL)
                gws_limit >>= 1;
#endif  

However, our crypt_all still uses int for *pcount and for its return value. So by allowing for up to 4G rather than 2G there, we accept triggering int overflows, which we then hopefully correct with implicit typecasts/promotions e.g. in the call to status_update_crypts:

        count = crk_key_index;
        match = crk_methods.crypt_all(&count, salt);
        crk_last_key = count;

        status_update_crypts((uint64_t)salt->count * count, count);

However, formally this is UB, and we probably shouldn't allow values this high or should update that function's prototype.

I think we should limit this to 2G or change the crypt_all prototype to use unsigned int or unsigned long or ARCH_WORD or uint64_t.

We might find a way to bodge-fix this but I think the canonical solution is to start using uint64_t where needed, instead of using int and later treating it as unsigned int - which might not always be enough and I reckon that's the case here.

uint64_t would have slight performance impact on 32-bit systems.

Edit: Given crypt_all() can return -1 (or did I dream that?) we might need int64_t

No, the description in formats.h doesn't specify that and the code in cracker.c doesn't expect it.

solardiz avatar Nov 16 '21 16:11 solardiz

Edit: Given crypt_all() can return -1 (or did I dream that?) we might need int64_t

No, the description in formats.h doesn't specify that and the code in cracker.c doesn't expect it.

Yes but we currently have this:

/* Use this macro for OpenCL Error handling in crypt_all() */
#define BENCH_CLERROR(cl_error, message)	  \
	do { cl_int __err = (cl_error); \
		if (__err != CL_SUCCESS) { \
			if (!ocl_autotune_running || options.verbosity >= VERB_MAX) \
				fprintf(stderr, "%u: OpenCL %s error in %s:%d - %s\n", \
					NODE, get_error_name(__err), __FILE__, __LINE__, message); \
			else if (options.verbosity > VERB_LEGACY) \
				fprintf(stderr, " %u: %s\n", NODE, get_error_name(__err)); \
			if (!(ocl_autotune_running || bench_or_test_running)) \
				error(); \
			else \
				return -1; \
		} \
	} while (0)

We should change this (not sure to what), or perhaps change formats.h to allow and document it.

magnumripper avatar Nov 16 '21 18:11 magnumripper

I think we should limit this to 2G or change the crypt_all prototype to use unsigned int or unsigned long or ARCH_WORD or uint64_t.

Of those, I think ARCH_WORD is best. It would allow the mentioned BENCH_CLERROR, it would allow high figures on high-end platforms and it wouldn't affect 32-bit builds.

magnumripper avatar Nov 16 '21 18:11 magnumripper

Oh, that BENCH_CLERROR is indeed problematic - not because of crypt_all return value type, but because our current calls to crypt_all don't expect a -1 return. We need to either remove the return -1; (maybe use return 0; there, for "nothing cracked" in testing, as long as we keep the error(); for actual cracking) or handle -1 in the callers. Do any other uses of BENCH_CLERROR expect the -1?

solardiz avatar Nov 17 '21 11:11 solardiz

We obviously do handle a -1 return in opencl_common.c (the autotune functions). Perhaps we should tweak that macro so it can only return -1 if ocl_autotune_running and nothing else.

magnumripper avatar Nov 17 '21 11:11 magnumripper

Apparently @claudioandre-br changed it in 462f2b4b6 so it can return -1 in some other situations that autotune. Not sure why.

magnumripper avatar Nov 17 '21 11:11 magnumripper

We could obviously add code in bench.c and formats.c to handle this "error return" but I think what would happen now is just we call cmp_all() with a count of -1 and not much harm done (self-test should fail anyway).

Perhaps we should just change bench_or_test_running to self_test_running so we don't ignore errors while benchmarking (they would be printed though).

magnumripper avatar Nov 17 '21 12:11 magnumripper

Apparently @claudioandre-br changed it in 462f2b4 so it can return -1 in some other situations

Reading the commit message, I would say it was bailing out with error on a --test run. We can revert and test it later to confirm, but it have to be something like that.

claudioandre-br avatar Nov 17 '21 13:11 claudioandre-br

I think what you wanted to achieve (and which should often be the case now) is for a self-test failure to not break the whole (all formats) --test (or --test=0) but continue like CPU-formats normally would. And that's generally a good thing - maybe we should keep it mostly as-is unless we see real and/or specific problems with it.

magnumripper avatar Nov 17 '21 18:11 magnumripper

This issue might be partially? fixed by 47008003c1b0494cfbe183b60649e068e51c4b59 - need to review/re-test.

solardiz avatar Apr 11 '22 17:04 solardiz

This issue might be partially? fixed by 4700800 - need to review/re-test.

The changes in there look like they'd allow up to exactly 4G (if the mask multiplier happens to be a power of 2). If so, that's wrong. We can support at best 4G-1, and more correctly (with the current code) up to 2G-1. I suggest changing 0x100000000ULL to 0x7fffffff.

Edit: See also #5103

solardiz avatar Jul 11 '23 20:07 solardiz