john
john copied to clipboard
Misreported c/s and C/s at high speeds
It looks like running nt-opencl with mask on Vega 64 overflows some math leading to incorrect c/s and C/s figures:
$ ./john -form=nt-opencl -mask='?a' -min-len=6 -max-len=8 -dev=1,4,5 -fork=3 pw
Using default input encoding: UTF-8
Loaded 10000 password hashes with no different salts (NT-opencl [MD4 OpenCL])
Remaining 159 password hashes with no different salts
Node numbers 1-3 of 3 (fork)
Device 1: gfx900 [Radeon RX Vega]
Device 4: GeForce GTX 1080
Device 5: GeForce GTX TITAN X
Build log: /tmp/OCL19185T1.cl:332:18: warning: unknown attribute 'max_constant_size' ignored
__attribute__((max_constant_size (NUM_INT_KEYS * 4)))
^
1 warning generated.
Press 'q' or Ctrl-C to abort, almost any other key for status
Build log: /tmp/OCL19185T2.cl:332:18: warning: unknown attribute 'max_constant_size' ignored
__attribute__((max_constant_size (NUM_INT_KEYS * 4)))
^
1 warning generated.
1 0g 0:00:00:22 1.95% (7) (ETA: 23:08:58) 0g/s 20816Mp/s 2074Mc/s 838488TC/s aa`8Bga..aa\7mpa
2 0g 0:00:00:23 1.04% (7) (ETA: 23:26:57) 0g/s 10653Mp/s 10653Mc/s 1693GC/s Dev#4:56C aafVfVf..aa][zVf
3 0g 0:00:00:25 1.04% (7) (ETA: 23:30:09) 0g/s 9801Mp/s 9801Mc/s 1558GC/s Dev#5:67C [email protected]
3 0g 0:00:02:37 7.21% (7) (ETA: 23:26:26) 0g/s 10804Mp/s 10804Mc/s 1717GC/s Dev#5:83C aavK4jW..aaav6jW
2 0g 0:00:02:37 9.44% (7) (ETA: 23:17:52) 0g/s 14141Mp/s 14141Mc/s 2248GC/s Dev#4:91C aa>"PzA..aaY9JzA
1 0g 0:00:02:37 16.43% (7) (ETA: 23:06:04) 0g/s 24616Mp/s 2293Mc/s 117495TC/s aa2iQ:o..aa:tL[o
I think the p/s figures are correct, and all are correct for the NVIDIA devices, but the c/s and C/s for the Vega are way off.
We might find a way to bodge-fix this but I think the canonical solution is to start using uint64_t where needed, instead of using int and later treating it as unsigned int - which might not always be enough and I reckon that's the case here.
Edit: Given crypt_all() can return -1 (or did I dream that?) we might need int64_t
We have many instances of these in mask-aware formats:
#if SIZEOF_SIZE_T > 4
/* We can't process more than 4G keys per crypt() */
while (gws_limit * mask_int_cand.num_int_cand > 0xffffffffUL)
gws_limit >>= 1;
#endif
However, our crypt_all still uses int for *pcount and for its return value. So by allowing for up to 4G rather than 2G there, we accept triggering int overflows, which we then hopefully correct with implicit typecasts/promotions e.g. in the call to status_update_crypts:
count = crk_key_index;
match = crk_methods.crypt_all(&count, salt);
crk_last_key = count;
status_update_crypts((uint64_t)salt->count * count, count);
However, formally this is UB, and we probably shouldn't allow values this high or should update that function's prototype.
I think we should limit this to 2G or change the crypt_all prototype to use unsigned int or unsigned long or ARCH_WORD or uint64_t.
We might find a way to bodge-fix this but I think the canonical solution is to start using
uint64_twhere needed, instead of usingintand later treating it asunsigned int- which might not always be enough and I reckon that's the case here.
uint64_t would have slight performance impact on 32-bit systems.
Edit: Given
crypt_all()can return-1(or did I dream that?) we might needint64_t
No, the description in formats.h doesn't specify that and the code in cracker.c doesn't expect it.
Edit: Given
crypt_all()can return-1(or did I dream that?) we might needint64_tNo, the description in
formats.hdoesn't specify that and the code incracker.cdoesn't expect it.
Yes but we currently have this:
/* Use this macro for OpenCL Error handling in crypt_all() */
#define BENCH_CLERROR(cl_error, message) \
do { cl_int __err = (cl_error); \
if (__err != CL_SUCCESS) { \
if (!ocl_autotune_running || options.verbosity >= VERB_MAX) \
fprintf(stderr, "%u: OpenCL %s error in %s:%d - %s\n", \
NODE, get_error_name(__err), __FILE__, __LINE__, message); \
else if (options.verbosity > VERB_LEGACY) \
fprintf(stderr, " %u: %s\n", NODE, get_error_name(__err)); \
if (!(ocl_autotune_running || bench_or_test_running)) \
error(); \
else \
return -1; \
} \
} while (0)
We should change this (not sure to what), or perhaps change formats.h to allow and document it.
I think we should limit this to 2G or change the
crypt_allprototype to useunsigned intorunsigned longorARCH_WORDoruint64_t.
Of those, I think ARCH_WORD is best. It would allow the mentioned BENCH_CLERROR, it would allow high figures on high-end platforms and it wouldn't affect 32-bit builds.
Oh, that BENCH_CLERROR is indeed problematic - not because of crypt_all return value type, but because our current calls to crypt_all don't expect a -1 return. We need to either remove the return -1; (maybe use return 0; there, for "nothing cracked" in testing, as long as we keep the error(); for actual cracking) or handle -1 in the callers. Do any other uses of BENCH_CLERROR expect the -1?
We obviously do handle a -1 return in opencl_common.c (the autotune functions). Perhaps we should tweak that macro so it can only return -1 if ocl_autotune_running and nothing else.
Apparently @claudioandre-br changed it in 462f2b4b6 so it can return -1 in some other situations that autotune. Not sure why.
We could obviously add code in bench.c and formats.c to handle this "error return" but I think what would happen now is just we call cmp_all() with a count of -1 and not much harm done (self-test should fail anyway).
Perhaps we should just change bench_or_test_running to self_test_running so we don't ignore errors while benchmarking (they would be printed though).
Apparently @claudioandre-br changed it in 462f2b4 so it can return -1 in some other situations
Reading the commit message, I would say it was bailing out with error on a --test run. We can revert and test it later to confirm, but it have to be something like that.
I think what you wanted to achieve (and which should often be the case now) is for a self-test failure to not break the whole (all formats) --test (or --test=0) but continue like CPU-formats normally would. And that's generally a good thing - maybe we should keep it mostly as-is unless we see real and/or specific problems with it.
This issue might be partially? fixed by 47008003c1b0494cfbe183b60649e068e51c4b59 - need to review/re-test.
This issue might be partially? fixed by 4700800 - need to review/re-test.
The changes in there look like they'd allow up to exactly 4G (if the mask multiplier happens to be a power of 2). If so, that's wrong. We can support at best 4G-1, and more correctly (with the current code) up to 2G-1. I suggest changing 0x100000000ULL to 0x7fffffff.
Edit: See also #5103