john Feature: print exact matching rule for a crack

during CMIYC, i realized it would give a huge productivity boost if there was a command-line option like --display-crack-rule which would show the exact rule that produced a crack. example output:

bullshit1900   (TBu519)  ('lAz"1900"<+')

this would allow to

quickly share successful rules with team
easily create new targetted ruleset based on only those that gave cracks
learn rule syntax by example

i suspect in case of rules like korelogic after finding a number of matching rules this would give a speedup of > 1000x for future cracking.

in order to implement this every candidate would need to store an integer variable associating the candidate with the rule number/line.

Aug 11 '20 18:08 rofl0r

Also it might be helpful to print base word too.

Aug 11 '20 18:08 AlekseyCherepanov

Quick way to get something like that is to check rule expansion in the log (maybe with --log-stderr or LogCrackedPasswords=Y). But there is a problem: candidates in buffer for hashing may be generated by different rules, so a crack from old rule may happen after line about new rule.

Generation of candidates with --stdout --log-stderr would work in terminal, but gets buffering when output is redirected:

$ echo asdf | ./JohnTheRipper/run/john --verbosity=4 --log-stderr --pipe --stdout --rules=': Az"[0-9][0-9][0-9][0-9]"' 2>&1 | grep -B 1 asdf1999
asdf1998
asdf1999

Simple rules could be matched with output from --stdout by numbers. But rejecting rules break that.

So it does not seem possible to reliably match crack with rule used now.

Aug 11 '20 18:08 AlekseyCherepanov

This has been discussed earlier. It's definitely possible or even trivial (to implement) but it would come with performance issues including (but not limited to) memory issues. Some formats use batches of millions of passwords, I think sometimes tens of millions. And "single mode" is especially tricky (yet is very dependent on good rules) Just storing a few bytes of extra information per candidate would quickly become many megabytes of information that need to be copied, kept, transferred, might blow caches and so on.

So the trick is probably to implement it in a way that we can opt to enable it at whatever performance costs but that does not hurt performance unless used. Definitely doable, there simply wasn't anyone (yet) that found it important enough to spend time on.

If/when we implement it we could (with even more overhead that would need to be amortized, hidden or by other means tricked away) have really cool functions like automagically re-ordering rules by how successful they were over time, even dropping (perhaps temporarily) rules that never got any crack, and so on.

Aug 12 '20 00:08 magnumripper

Would it be a performance penalty even if rule logging is default off? In my world I would first do some test runs and collect statistics, build a rule set. Then turn off logging and run with the created rules at full speed. It would be a very good feature in some use cases - like CMIYC - if it is possible to implement without speed penalty when the option is off.

Aug 12 '20 07:08 AlbertVeli

Would it be a performance penalty even if rule logging is default off?

We'd have to implement it carefully so it wouldn't (and this is why I never bothered with this yet, even though I've wanted this feature for years). I'm sure it's possible but it's not trivial (I think).

With even more careful design it might not even have to be a performance penalty when activated (so it would be "on" all the time, no need for any option). I'm not sure that's possible at all, but it probably is.

Aug 16 '20 19:08 magnumripper

PerRuleStats added in #5010 partially address the needs expressed here.

bullshit1900   (TBu519)  ('lAz"1900"<+')

We do not currently print a rule like that - it's still only in the log file. However, if we add such printing of the current rule now and PerRuleStats is enabled, that rule would actually be the one producing the crack. So maybe we should enable such printing only in wordlist mode (not in Single mode) and only when PerRuleStats is enabled - in fact, we can always do it when PerRuleStats is enabled - although I typically run it with low verbosity (planning to process the log file instead), which would hide these entire lines, as intended.

this would allow to

* quickly share successful rules with team

OK, that's not addressed yet.

* easily create new targetted ruleset based on only those that gave cracks

This is addressed by #5010 - the added comment in john.conf gives specific commands to process a log file into a new rule set. The new rules-by-score.conf and rules-by-rate.conf also include commands showing how they were generated.

* learn rule syntax by example

OK, I suppose log files are worse in this respect.

Also it might be helpful to print base word too.

Yes, and this could be used to auto-re-order wordlists like we now do for rules - however, this is trickier and would have greater performance and memory usage impact.

Feb 12 '22 20:02 solardiz

candidates in buffer for hashing may be generated by different rules, so a crack from old rule may happen after line about new rule.

Enabling PerRuleStats prevents this problem, but it does have some performance impact. When the wordlist is many times larger than the buffer, the performance impact is low. Otherwise, it can sometimes be high.

Feb 12 '22 20:02 solardiz

john john copied to clipboard

Feature: print exact matching rule for a crack

john
john copied to clipboard