john
john copied to clipboard
Feature: print exact matching rule for a crack
during CMIYC, i realized it would give a huge productivity boost if there was a command-line option like --display-crack-rule
which would show the exact rule that produced a crack.
example output:
bullshit1900 (TBu519) ('lAz"1900"<+')
this would allow to
- quickly share successful rules with team
- easily create new targetted ruleset based on only those that gave cracks
- learn rule syntax by example
i suspect in case of rules like korelogic after finding a number of matching rules this would give a speedup of > 1000x for future cracking.
in order to implement this every candidate would need to store an integer variable associating the candidate with the rule number/line.
Also it might be helpful to print base word too.
Quick way to get something like that is to check rule expansion in the log (maybe with --log-stderr
or LogCrackedPasswords=Y
). But there is a problem: candidates in buffer for hashing may be generated by different rules, so a crack from old rule may happen after line about new rule.
Generation of candidates with --stdout --log-stderr
would work in terminal, but gets buffering when output is redirected:
$ echo asdf | ./JohnTheRipper/run/john --verbosity=4 --log-stderr --pipe --stdout --rules=': Az"[0-9][0-9][0-9][0-9]"' 2>&1 | grep -B 1 asdf1999
asdf1998
asdf1999
Simple rules could be matched with output from --stdout
by numbers. But rejecting rules break that.
So it does not seem possible to reliably match crack with rule used now.
This has been discussed earlier. It's definitely possible or even trivial (to implement) but it would come with performance issues including (but not limited to) memory issues. Some formats use batches of millions of passwords, I think sometimes tens of millions. And "single mode" is especially tricky (yet is very dependent on good rules) Just storing a few bytes of extra information per candidate would quickly become many megabytes of information that need to be copied, kept, transferred, might blow caches and so on.
So the trick is probably to implement it in a way that we can opt to enable it at whatever performance costs but that does not hurt performance unless used. Definitely doable, there simply wasn't anyone (yet) that found it important enough to spend time on.
If/when we implement it we could (with even more overhead that would need to be amortized, hidden or by other means tricked away) have really cool functions like automagically re-ordering rules by how successful they were over time, even dropping (perhaps temporarily) rules that never got any crack, and so on.
Would it be a performance penalty even if rule logging is default off? In my world I would first do some test runs and collect statistics, build a rule set. Then turn off logging and run with the created rules at full speed. It would be a very good feature in some use cases - like CMIYC - if it is possible to implement without speed penalty when the option is off.
Would it be a performance penalty even if rule logging is default off?
We'd have to implement it carefully so it wouldn't (and this is why I never bothered with this yet, even though I've wanted this feature for years). I'm sure it's possible but it's not trivial (I think).
With even more careful design it might not even have to be a performance penalty when activated (so it would be "on" all the time, no need for any option). I'm not sure that's possible at all, but it probably is.
PerRuleStats
added in #5010 partially address the needs expressed here.
bullshit1900 (TBu519) ('lAz"1900"<+')
We do not currently print a rule like that - it's still only in the log file. However, if we add such printing of the current rule now and PerRuleStats
is enabled, that rule would actually be the one producing the crack. So maybe we should enable such printing only in wordlist mode (not in Single mode) and only when PerRuleStats
is enabled - in fact, we can always do it when PerRuleStats
is enabled - although I typically run it with low verbosity (planning to process the log file instead), which would hide these entire lines, as intended.
this would allow to
* quickly share successful rules with team
OK, that's not addressed yet.
* easily create new targetted ruleset based on only those that gave cracks
This is addressed by #5010 - the added comment in john.conf
gives specific commands to process a log file into a new rule set. The new rules-by-score.conf
and rules-by-rate.conf
also include commands showing how they were generated.
* learn rule syntax by example
OK, I suppose log files are worse in this respect.
Also it might be helpful to print base word too.
Yes, and this could be used to auto-re-order wordlists like we now do for rules - however, this is trickier and would have greater performance and memory usage impact.
candidates in buffer for hashing may be generated by different rules, so a crack from old rule may happen after line about new rule.
Enabling PerRuleStats
prevents this problem, but it does have some performance impact. When the wordlist is many times larger than the buffer, the performance impact is low. Otherwise, it can sometimes be high.