john Allow --dupe-suppression in batch mode

trafficstars

Dupe suppression is enabled by default in batch mode (because it includes wordlist+rules), but the command-line option to control it is not allowed (technically because no other option specifies the use of rules). Ideally, we'd allow this option to make it easier to explicitly disable the dupe suppressor or adjust its memory usage. Right now, the only ways to do that are editing the configuration file or overriding the configuration file name.

This may be a bit tricky to implement (which is why it hasn't been done right away).

Aug 15 '24 12:08 solardiz

This was tricky to figure out, but looks easy if we accept the side-effects as desirable:

commit cd8d511847dca14d0cbadabc2d6587ae99cf5352 (HEAD -> fixes-20240923)
Author: Solar <[email protected]>
Date:   Mon Sep 23 02:50:30 2024 +0200

    Options: Allow --rules and/or --dupe-suppression in batch mode
    
    Fixes #5524

diff --git a/src/options.h b/src/options.h
index d0a71b01e..09d82605e 100644
--- a/src/options.h
+++ b/src/options.h
@@ -69,7 +69,7 @@
        (FLG_EXTERNAL_CHK | FLG_ACTION | FLG_CRACKING_SUP | FLG_PWD_SUP)
 /* Batch cracker */
 #define FLG_BATCH_CHK                  0x0000000000004000ULL
-#define FLG_BATCH_SET                  (FLG_BATCH_CHK | FLG_CRACKING_SET)
+#define FLG_BATCH_SET                  (FLG_BATCH_CHK | FLG_CRACKING_SET | FLG_RULES_ALLOW)
 /* Stdout mode */
 #define FLG_STDOUT                     0x0000000000008000ULL
 /* Restoring an interrupted session */

This allows not only --dupe-suppression, but also --rules with and without parameter (in the latter case, it's a no-op since our default rules are enabled in batch mode by default) and even --rules-stack.

However, specifying only --rules-stack (without --rules) somehow results in weird behavior where the process almost locks up or runs very slowly once it reaches pass 2 (wordlist + rules), but only with some rulesets (e.g., with --rules-stack=o1).

Sep 23 '24 01:09 solardiz

Are you saying it somehow doesn't behave like -w -ru -rules-stack=o1?

Sep 24 '24 21:09 magnumripper

Are you saying it somehow doesn't behave like -w -ru -rules-stack=o1?

Yes. ./john pw -rules-stack=o1 vs. ./john pw -w -ru -rules-stack=o1 behave differently - the former feels somewhat unresponsive and shows speeds that are ~1000 times lower than the latter's (when both are run against the same fast hashes). I am puzzled.

Sep 25 '24 00:09 solardiz

Something is b0rken for sure.

john.conf:

# Default Single mode rules
SingleRules = prepend

# Default batch mode Wordlist rules
BatchModeWordlistRules = prepend

[List.Rules:prepend]
^[ABC]

[List.Rules:append]
$[xyz]

$ ../run/john 10.sam -form:nt -rules-stack:append -log 2>&1 | head -100
Using default input encoding: UTF-8
2024-09-27 03:16:10 0:00:00:00 Starting a new session
2024-09-27 03:16:10 0:00:00:00 Loaded a total of 10 password hashes with no different salts
Loaded 10 password hashes with no different salts (NT [MD4 256/256 AVX2 8x3])
Warning: no OpenMP support for this hash type, consider --fork=16
2024-09-27 03:16:10 0:00:00:00 Command line: ../run/john 10.sam --format=nt --rules-stack=append --log-stderr 
2024-09-27 03:16:10 0:00:00:00 - UTF-8 input encoding enabled
2024-09-27 03:16:10 0:00:00:00 - Passwords in this logfile are UTF-8 encoded
2024-09-27 03:16:10 0:00:00:00 - Passwords will be stored UTF-8 encoded in .pot file
2024-09-27 03:16:10 0:00:00:00 - Hash type: NT (min-len 0, max-len 27)
2024-09-27 03:16:10 0:00:00:00 - Algorithm: MD4 256/256 AVX2 8x3
Note: Passwords longer than 27 rejected
2024-09-27 03:16:10 0:00:00:00 - Configured to use otherwise idle processor cycles only
2024-09-27 03:16:10 0:00:00:00 - Will reject candidates longer than 81 bytes
2024-09-27 03:16:10 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 192
2024-09-27 03:16:10 0:00:00:00 Proceeding with "single crack" mode
Proceeding with single, rules:(prepend x append)
2024-09-27 03:16:10 0:00:00:00 - SingleWordsPairMax used is 6
2024-09-27 03:16:10 0:00:00:00 - SingleRetestGuessed = true
2024-09-27 03:16:10 0:00:00:00 - SingleMaxBufferSize = 22 GiB
2024-09-27 03:16:10 0:00:00:00 - SinglePrioResume = N (prioritize speed over resumability)
2024-09-27 03:16:10 0:00:00:00 + Stacked rules: append
2024-09-27 03:16:10 0:00:00:00 - 3 preprocessed stacked rules
2024-09-27 03:16:10 0:00:00:00 - Total 9 (3 x 3) preprocessed word mangling rules

Note how it says "9 rules (3 x 3)" here, which is correct.

2024-09-27 03:16:10 0:00:00:00 - Allocated 1 buffer of 24 candidate passwords (total 130 KiB)
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
2024-09-27 03:16:10 0:00:00:00 - Rule #1: '^A' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:16:10 0:00:00:00 - Rule #2: '^B' accepted
2024-09-27 03:16:10 0:00:00:00 - Rule #3: '^C' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #2, stacked #1
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #1: '$y' accepted
2024-09-27 03:16:10 0:00:00:00 - Rule #1: '^A' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #3, stacked #2
2024-09-27 03:16:10 0:00:00:00 - Rule #2: '^B' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #1, stacked #2
2024-09-27 03:16:10 0:00:00:00 - Rule #3: '^C' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #2, stacked #2
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #2: '$z' accepted
2024-09-27 03:16:10 0:00:00:00 - Rule #1: '^A' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #3, stacked #3
2024-09-27 03:16:10 0:00:00:00 - Rule #2: '^B' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #1, stacked #3
2024-09-27 03:16:10 0:00:00:00 - Rule #3: '^C' accepted
2024-09-27 03:16:10 0:00:00:00 - Oldest still in use rules are now base #2, stacked #3
2024-09-27 03:16:10 0:00:00:00 - Processing the remaining buffered candidate passwords, if any
Almost done: Processing the remaining buffered candidate passwords, if any.
0g 0:00:00:00 DONE 1/3 (2024-09-27 03:16) 0g/s 20400p/s 20400c/s 204000C/s CAssistantHelpz..Cadminz

The above looks normal. Apparently the stacked rules are actually applied differently in single mode: First the whole mode runs through stacked rule 1, then everything rewinds and we go through stacked rule 2, and so on. My memory of implementing it is very faint but it was likely easiest to add that way, or perhaps just least intrusive. Now let's see what happens next:

2024-09-27 03:16:10 0:00:00:00 Proceeding with wordlist mode
2024-09-27 03:16:10 0:00:00:00 - Rules: prepend
Proceeding with wordlist:../run/password.lst, rules-stack:append
2024-09-27 03:16:10 0:00:00:00 - Wordlist file: ../run/password.lst
2024-09-27 03:16:10 0:00:00:00 - memory mapping wordlist (15327454 bytes)
2024-09-27 03:16:10 0:00:00:00 - Total 3 (3 x 1) preprocessed word mangling rules
2024-09-27 03:16:10 0:00:00:00 + Stacked rules: append
2024-09-27 03:16:10 0:00:00:00 - 3 preprocessed stacked rules

Here's the first sign of some change that may have caused the regression. The total should be printed later and say "9 (3 x 3)" just like in single mode. This problem isn't just with batch mode, it happens with -w -ru:prepend -rules-stack:append now as well.

2024-09-27 03:16:10 0:00:00:00 Enabling duplicate candidate password suppressor
Enabling duplicate candidate password suppressor
2024-09-27 03:16:10 0:00:00:00 - Rule #1: '^A' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #2: '$y' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #3: '$z' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #2: '$y' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #3: '$z' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #2: '$y' accepted
2024-09-27 03:16:10 0:00:00:00 + Stacked Rule #3: '$z' accepted
(...)

...and then it seems to get stuck in an indefinite loop. Note that the SAME happens using -w -ru:prepend -rules-stack:append.

Not sure what caused this, we should be able to bisect it.

Sep 27 '24 01:09 magnumripper

Hmm maybe there wasn't an indefinite loop after all.

words.lst

alpha
bravo
charlie

$ ../run/john 10.sam -form:nt -w:words.lst -ru:prepend -rules-stack:append -log -v:3
Using default input encoding: UTF-8
2024-09-27 03:33:37 0:00:00:00 Starting a new session
2024-09-27 03:33:37 0:00:00:00 Loaded a total of 10 password hashes with no different salts
Loaded 10 password hashes with no different salts (NT [MD4 256/256 AVX2 8x3])
Warning: no OpenMP support for this hash type, consider --fork=16
2024-09-27 03:33:37 0:00:00:00 Command line: ../run/john 10.sam --format=nt --wordlist=words.lst --rules=prepend --rules-stack=append --log-stderr --verbosity=3 
2024-09-27 03:33:37 0:00:00:00 - UTF-8 input encoding enabled
2024-09-27 03:33:37 0:00:00:00 - Passwords in this logfile are UTF-8 encoded
2024-09-27 03:33:37 0:00:00:00 - Passwords will be stored UTF-8 encoded in .pot file
2024-09-27 03:33:37 0:00:00:00 - Hash type: NT (min-len 0, max-len 27)
2024-09-27 03:33:37 0:00:00:00 - Algorithm: MD4 256/256 AVX2 8x3
Note: Passwords longer than 27 rejected
2024-09-27 03:33:37 0:00:00:00 - Configured to use otherwise idle processor cycles only
2024-09-27 03:33:37 0:00:00:00 - Will reject candidates longer than 81 bytes
2024-09-27 03:33:37 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 192
2024-09-27 03:33:37 0:00:00:00 Proceeding with wordlist mode
2024-09-27 03:33:37 0:00:00:00 - Rules: prepend
2024-09-27 03:33:37 0:00:00:00 - Wordlist file: words.lst
2024-09-27 03:33:37 0:00:00:00 - memory mapping wordlist (20 bytes)
2024-09-27 03:33:37 0:00:00:00 - loading wordfile words.lst into memory (20 bytes, max_size=150000000)
2024-09-27 03:33:37 0:00:00:00 - wordfile had 3 lines and required 24 bytes for index.
2024-09-27 03:33:37 0:00:00:00 - Total 3 (3 x 1) preprocessed word mangling rules
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
2024-09-27 03:33:37 0:00:00:00 + Stacked rules: append
2024-09-27 03:33:37 0:00:00:00 - 3 preprocessed stacked rules
2024-09-27 03:33:37 0:00:00:00 Enabling duplicate candidate password suppressor
Enabling duplicate candidate password suppressor
2024-09-27 03:33:38 0:00:00:00 - Rule #1: '^A' accepted
2024-09-27 03:33:38 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:33:38 0:00:00:00 + Stacked Rule #2: '$y' accepted
2024-09-27 03:33:38 0:00:00:00 + Stacked Rule #3: '$z' accepted
2024-09-27 03:33:38 0:00:00:00 - Some rule logging suppressed. Re-enable with --verbosity=4 or greater
2024-09-27 03:33:38 0:00:00:00 - Rule #2: '^B' accepted
2024-09-27 03:33:38 0:00:00:00 - Rule #3: '^C' accepted
0g 0:00:00:00 DONE (2024-09-27 03:33) 0g/s 168.8p/s 168.8c/s 1687C/s Aalphax..Ccharliez
2024-09-27 03:33:38 0:00:00:00 Session completed

I had my default verbosity bumped (for seeing opencl build warnings 🙄) and that emitted the "stacked rule #..." for every word.

Sep 27 '24 01:09 magnumripper

With that confusion cleared, here's batch mode, using the same short word list:

$ ../run/john 10.sam -form:nt -rules-stack:append -log -v:3
Using default input encoding: UTF-8
2024-09-27 03:38:55 0:00:00:00 Starting a new session
2024-09-27 03:38:55 0:00:00:00 Loaded a total of 10 password hashes with no different salts
Loaded 10 password hashes with no different salts (NT [MD4 256/256 AVX2 8x3])
Warning: no OpenMP support for this hash type, consider --fork=16
2024-09-27 03:38:55 0:00:00:00 Command line: ../run/john 10.sam --format=nt --rules-stack=append --log-stderr --verbosity=3 
2024-09-27 03:38:55 0:00:00:00 - UTF-8 input encoding enabled
2024-09-27 03:38:55 0:00:00:00 - Passwords in this logfile are UTF-8 encoded
2024-09-27 03:38:55 0:00:00:00 - Passwords will be stored UTF-8 encoded in .pot file
2024-09-27 03:38:55 0:00:00:00 - Hash type: NT (min-len 0, max-len 27)
2024-09-27 03:38:55 0:00:00:00 - Algorithm: MD4 256/256 AVX2 8x3
Note: Passwords longer than 27 rejected
2024-09-27 03:38:55 0:00:00:00 - Configured to use otherwise idle processor cycles only
2024-09-27 03:38:55 0:00:00:00 - Will reject candidates longer than 81 bytes
2024-09-27 03:38:55 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 192
2024-09-27 03:38:55 0:00:00:00 Proceeding with "single crack" mode
Proceeding with single, rules:(prepend x append)
2024-09-27 03:38:55 0:00:00:00 - SingleWordsPairMax used is 6
2024-09-27 03:38:55 0:00:00:00 - SingleRetestGuessed = true
2024-09-27 03:38:55 0:00:00:00 - SingleMaxBufferSize = 23 GiB
2024-09-27 03:38:55 0:00:00:00 - SinglePrioResume = N (prioritize speed over resumability)
2024-09-27 03:38:55 0:00:00:00 + Stacked rules: append
2024-09-27 03:38:55 0:00:00:00 - 3 preprocessed stacked rules
2024-09-27 03:38:55 0:00:00:00 - Total 9 (3 x 3) preprocessed word mangling rules
2024-09-27 03:38:55 0:00:00:00 - Allocated 1 buffer of 24 candidate passwords (total 130 KiB)
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
2024-09-27 03:38:55 0:00:00:00 - Rule #1: '^A' accepted
2024-09-27 03:38:55 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:38:55 0:00:00:00 - Rule #2: '^B' accepted
2024-09-27 03:38:55 0:00:00:00 - Rule #3: '^C' accepted
2024-09-27 03:38:55 0:00:00:00 - Oldest still in use rules are now base #2, stacked #1
2024-09-27 03:38:55 0:00:00:00 - Some rule logging suppressed. Re-enable with --verbosity=4 or greater
2024-09-27 03:38:55 0:00:00:00 + Stacked Rule #1: '$y' accepted
2024-09-27 03:38:55 0:00:00:00 + Stacked Rule #2: '$z' accepted
2024-09-27 03:38:55 0:00:00:00 - Processing the remaining buffered candidate passwords, if any
Almost done: Processing the remaining buffered candidate passwords, if any.
0g 0:00:00:00 DONE 1/3 (2024-09-27 03:38) 0g/s 20400p/s 20400c/s 204000C/s CAssistantHelpz..Cadminz
2024-09-27 03:38:55 0:00:00:00 Proceeding with wordlist mode
2024-09-27 03:38:55 0:00:00:00 - Rules: prepend
Proceeding with wordlist:words.lst, rules-stack:append
2024-09-27 03:38:55 0:00:00:00 - Wordlist file: words.lst
2024-09-27 03:38:55 0:00:00:00 - memory mapping wordlist (20 bytes)
2024-09-27 03:38:55 0:00:00:00 - Total 3 (3 x 1) preprocessed word mangling rules
2024-09-27 03:38:55 0:00:00:00 + Stacked rules: append
2024-09-27 03:38:55 0:00:00:00 - 3 preprocessed stacked rules
2024-09-27 03:38:55 0:00:00:00 Enabling duplicate candidate password suppressor
Enabling duplicate candidate password suppressor
2024-09-27 03:38:55 0:00:00:00 + Stacked Rule #1: '$x' accepted
2024-09-27 03:38:55 0:00:00:00 + Stacked Rule #2: '$y' accepted
2024-09-27 03:38:55 0:00:00:00 + Stacked Rule #3: '$z' accepted
2024-09-27 03:38:55 0:00:00:00 - Some rule logging suppressed. Re-enable with --verbosity=4 or greater
0g 0:00:00:00 DONE 2/3 (2024-09-27 03:38) 0g/s 1493p/s 1493c/s 14937C/s Aalphax..Ccharliexyyzz
2024-09-27 03:38:55 0:00:00:00 Proceeding with "incremental" mode: ASCII
Proceeding with incremental:ASCII, rules-stack:append
2024-09-27 03:38:55 0:00:00:00 - Lengths 0 to 13, up to 95 different characters
2024-09-27 03:38:55 0:00:00:00 + Stacked rules: append
2024-09-27 03:38:55 0:00:00:00 - 3 preprocessed stacked rules
(...)

Note though that the stacked rules are applied to incremental mode as well (as designed).

Perhaps the "3 (3 x 1) rules" is merely cosmetic, from some change that didn't cause other trouble.

Note that running with verbosity 4 or higher can hit performance drastically due to logging alone! Perhaps that was/is the problem?

Sep 27 '24 01:09 magnumripper

Note that running with verbosity 4 or higher can hit performance drastically due to logging alone! Perhaps that was/is the problem?

That is not it. Running batch mode with -rules-stack=o1 -v:3 starts at 1Mp/s or so and decreases over time. Running -w -ru -rules-stack=o1 -v:3 runs steadily at 5-6Mp/s.

Sep 27 '24 01:09 magnumripper

Re-opening so that we don't forget to look into the new (or remaining) weirdness further, although we may instead want to open a separate issue for that (please feel free to do that @magnumripper).

Sep 28 '24 13:09 solardiz

I think I've just fixed the --rules-stack weirdness by adding FLG_BATCH_CHK checks to the 4 places in rules.c where we determine rules_stacked_after. Alternatively, we could temporarily set FLG_RULES_CHK in pass 2 of batch mode, like we already do for FLG_SINGLE_CHK. Anyhow, I find it wrong that rules.c looks at these flags at all - ideally, we'd have this at a higher abstraction level. I also don't like that while do_wordlist_crack has a rules Boolean argument, we don't use it consistently sometimes looking at the option flags instead.

Anyway, if the current fix works, then let's close this issue.

@magnumripper Can you please check using your test cases? Thank you!

Sep 29 '24 02:09 solardiz

FWIW I still get the out of order stats:

2024-09-30 08:56:56 0:00:00:00 - Total 3 (3 x 1) preprocessed word mangling rules
2024-09-30 08:56:56 0:00:00:00 + Stacked rules: append
2024-09-30 08:56:56 0:00:00:00 - 3 preprocessed stacked rules

Should be:

2024-09-30 08:56:56 0:00:00:00 + Stacked rules: append
2024-09-30 08:56:56 0:00:00:00 - 3 preprocessed stacked rules
2024-09-30 08:56:56 0:00:00:00 - Total 9 (3 x 3) preprocessed word mangling rules

I'd like to find what broke it.

Sep 30 '24 07:09 magnumripper

I still get the out of order stats:

Is this related to my allowing for stacked rules in batch mode or is it a separate/older issue?

Sep 30 '24 13:09 solardiz

I still get the out of order stats:

Is this related to my allowing for stacked rules in batch mode or is it a separate/older issue?

Way older. I tried to bisect but failed (once back at 2020 or so, today's compilers refuse to build for all sorts of odd reasons - I'd need to set up some older version of Linux). I will try to figure it out without bisecting.

Nov 07 '24 15:11 magnumripper

OK this appears to be a purely cosmetical problem now, and 'git blame' indicates it might have been this way ever since stacked rules were merged. Here's a fix for it:

diff --git a/src/wordlist.c b/src/wordlist.c
index e2c827ff8..cbd5cd745 100644
--- a/src/wordlist.c
+++ b/src/wordlist.c
@@ -1037,16 +1037,6 @@ REDO_AFTER_LMLOOP:
                rules_init(db, length);
                rule_count = rules_count(&ctx, -1);
 
-               if (do_lmloop || !db->plaintexts->head) {
-                       if (rules_stacked_after)
-                               log_event("- Total %u (%d x %u) preprocessed word mangling rules",
-                                         rule_count * crk_stacked_rule_count,
-                                         rule_count, crk_stacked_rule_count);
-                       else
-                               log_event("- %d preprocessed word mangling rules", rule_count);
-               }
-
-
                apply = rules_apply;
        } else {
                rule_ctx = NULL;
@@ -1077,6 +1067,15 @@ REDO_AFTER_LMLOOP:
                crk_init(db, fix_state, NULL);
        }
 
+       if (rules && (do_lmloop || !db->plaintexts->head)) {
+               if (rules_stacked_after)
+                       log_event("- Total %u (%d x %u) preprocessed word mangling rules",
+                                 rule_count * crk_stacked_rule_count,
+                                 rule_count, crk_stacked_rule_count);
+               else
+                       log_event("- %d preprocessed word mangling rules", rule_count);
+       }
+
        if (dupeCheck || rules) {
                int force = (dupeCheck || (options.flags & FLG_STDOUT)) && options.suppressor_size;
                suppressor_init(SUPPRESSOR_UPDATE | (force ? SUPPRESSOR_FORCE : 0));

This moves the logging to after crk_init(), from which stacked rules are initialized. Given it's only log output, this should be 100% safe to commit... but those are famous last words so I'm not sure we should bother 😆

Nov 07 '24 15:11 magnumripper

@magnumripper Since the remaining issue is much older, I suggest we close this issue and you may open a separate issue/PR for that older issue.

Nov 07 '24 16:11 solardiz

Yes... or maybe I just make a PR with that diff above? I really can't see any dragons hiding there.

Nov 07 '24 20:11 magnumripper

maybe I just make a PR with that diff above? I really can't see any dragons hiding there.

Sure, that's what I meant by "open a separate issue/PR for that older issue".

Nov 07 '24 20:11 solardiz

john john copied to clipboard

Allow --dupe-suppression in batch mode

john
john copied to clipboard