john icon indicating copy to clipboard operation
john copied to clipboard

Duplicate rule suppression sometimes fails

Open solardiz opened this issue 3 years ago • 26 comments

As discussed in https://github.com/openwall/john/issues/5004#issuecomment-1019310089:

Shouldn't the current code have rejected a repeated rule like the below (when running -ru=jumbo)?

$ fgrep 'c Az"111"' sim3.log 
0:00:02:49 - Rule #923: '-c (?a c Az"111" <+' accepted as '(?acAz"111"<+'
0:00:02:49 - Score 0.012869018 at 10490.31 p/g 135g 1416192p during rule #923 :-c (?a c Az"111" <+
0:00:08:32 - Rule #2769: '-c (?a c Az"111" <+' accepted as '(?acAz"111"<+'
0:00:08:33 - Score 0.000000001 at 1416192000000000.00 p/g 0g 1416192p during rule #2769 :-c (?a c Az"111" <+

Obviously, it doesn't crack anything the second time this rule is tried. The log has this near the start:

0:00:00:00 - 72076 preprocessed word mangling rules were reduced by dropping 11 rules
0:00:00:00 - 72065 preprocessed word mangling rules

We should find out why the duplicate rule wasn't dropped.

solardiz avatar Jan 22 '22 22:01 solardiz

I guess fixing this issue will affect restored sessions, which may then skip some rules even if the input rule set has not changed. I don't know if we want to deal with this in any way (e.g., note the fix in .rec file, disable the fix when restoring from older .rec files that did not yet have the fix) or just let this be (and document it?)

Also, we'll have a similar issue with restoring older sessions if we enhance the suppression to recognize more kinds of effectively duplicate rules. In particular, we could recognize more kinds of effectively NOP commands, including some that are generated with the preprocessor in our rule sets - I mean those using the z and Z character classes and z position code. In fact, such effective NOPs should probably be squeezed out when accepting a rule, like we do for explicit NOPs.

So we could want to implement that extra NOP detection at the same time with fixing the bug, not to introduce the issue with restoring on two separate occasions.

solardiz avatar Feb 16 '22 15:02 solardiz

I tried to reproduce but my -ru=jumbo only has "8389 preprocessed word mangling rules were reduced by dropping 10 rules" and doesn't contain that rule.

magnumripper avatar Feb 16 '22 16:02 magnumripper

BTW I notice now we miss 'OneRuleToRuleThemAll' in -rules:all, I consider that a bug. Is there more? Was it intentional? I couldn't care less about resuming issues between releases, except it'd be nice to document them of course.

magnumripper avatar Feb 16 '22 16:02 magnumripper

Found others though. We seem to have a big problem here - many dupes are missed.

magnumripper avatar Feb 16 '22 16:02 magnumripper

Using -v:6, you get individual logging of what rules are removed. This is strange, for -ru:jumbo it ends up this:

$ grep -P "duplicate rule|reduced" ../run/john.log 
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1259: :
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1260: :
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1260: -s x**
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1261: -c %2[ ] T[0z] \p0[Q:] \p0[M:] va01 vbpa Tb Q M /[ ] vbpa Tb Q @?[Zw]
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1262: :
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1262: <* $1
2022-02-16 17:54:44 0:00:00:00 - duplicate rule removed at line 1263: -u %2E op\x89 ip\xc3 %2E op\x89 ip\xc3
2022-02-16 17:54:44 0:00:00:00 - 8389 preprocessed word mangling rules were reduced by dropping 10 rules

So it drops 7 rules but claims it was 10? Another interesting thing is it logs dropping before the preprocessor, at least in 1261. If it only sees rules before the pp, no wonder it would miss things and that would explain the OP.

On the other hand, I'm sure we talked about dupe suppression after preprocessing (or both before and after) back when Jim implemented this. I need to dig further.

magnumripper avatar Feb 16 '22 17:02 magnumripper

my -ru=jumbo [...] doesn't contain that rule.

It has changed since - to reproduce with the exact rule in my original comment, you need john.conf from back then.

we miss 'OneRuleToRuleThemAll' in -rules:all, I consider that a bug. Is there more? Was it intentional?

Intentional - it is supposed to be a subset of hashcat. However, I think we may want to replace hashcat with OneRuleToRuleThemAll in there - would probably crack almost as many, need to confirm.

I couldn't care less about resuming issues between releases

I do care, and it's not only between releases, but also between revisions - it is common and otherwise reasonable to git pull and rebuild once in a while.

As an option, maybe we should have the dupe rule suppression disabled when restoring older sessions (this would reintroduce a handful of dupe rules to those, but that's OK), and only enable it for new sessions and for restoring such new sessions. That way, we wouldn't need to keep the current broken behavior implemented for backwards compatibility.

solardiz avatar Feb 16 '22 17:02 solardiz

The code for dupe rules suppression is from 2011, pre git history. I need to search ML archives for whatever discussions we had back then.

magnumripper avatar Feb 16 '22 17:02 magnumripper

Also, only dropping before preprocessor is a reasonable mode - even if far less effective than what we could have. If that's the only issue here, then maybe we shouldn't call this a bug, but just preliminary implementation. We could then add post-pp as an enhancement, perhaps along with detection/squeezing of more kinds of effective NOP commands. And this is a rather low priority enhancement task then.

solardiz avatar Feb 16 '22 17:02 solardiz

BTW, my switching us to pre-optimized and thus pre-expanded default rules, with extra effective NOP squeezing with sed (as documented in the comments), lowers priority of us implementing any enhancements in code. Such enhancements would not affect our new default wordlist rules at all!

solardiz avatar Feb 16 '22 17:02 solardiz

we miss 'OneRuleToRuleThemAll' in -rules:all, I consider that a bug. Is there more? Was it intentional?

Intentional - it is supposed to be a subset of hashcat.

I see. The total accepted rules bumped a lot when I include 'OneRuleToRuleThemAll' but then again that may be because of this issue.

magnumripper avatar Feb 16 '22 17:02 magnumripper

The log says eg. 236188 preprocessed word mangling rules were reduced by dropping 68821 rules and the last rule for that session was Rule #167366: '-8,5' accepted. That suggests the suppression is happening after PP (although we have an off-by-one in the reporting).

Yet I did see duplicate rule removed at line 1261: -c %2[ ] T[0z] \p0[Q:] \p0[M:] va01 vbpa Tb Q M /[ ] vbpa Tb Q @?[Zw] so it might be a clue. Perhaps the dupe suppression somehow thought we were in "Hashcat mode" (so no pp) at that point?

magnumripper avatar Feb 16 '22 17:02 magnumripper

If this issue is not only the pre- vs. post-pp aspect, and we fix it, and OneRuleToRuleThemAll is literally a subset of hashcat (someone should check this - I did not), then including OneRuleToRuleThemAll before hashcat in our All would make sense.

solardiz avatar Feb 16 '22 17:02 solardiz

That suggests the suppression is happening after PP

Does it suggest that? I think it does not: the hashcat-derived rule sets are large, and don't use PP.

solardiz avatar Feb 16 '22 17:02 solardiz

That pp line is indeed a dupe line in the config

$ git grep -nF -- '-c %2[ ] T[0z] \p0[Q:] \p0[M:] va01 vbpa Tb Q M /[ ] vbpa Tb Q @?[Zw]' ..
../run/john.conf:1058:-c %2[ ] T[0z] \p0[Q:] \p0[M:] va01 vbpa Tb Q M /[ ] vbpa Tb Q @?[Zw]
../run/john.conf:1063:-c %2[ ] T[0z] \p0[Q:] \p0[M:] va01 vbpa Tb Q M /[ ] vbpa Tb Q @?[Zw]

magnumripper avatar Feb 16 '22 17:02 magnumripper

That pp line is indeed a dupe line in the config

Oh, probably a config file bug for us to fix, then? We could drop the dupe line, or we could first try and figure out why this happened - maybe it was supposed to differ in some way?

solardiz avatar Feb 16 '22 17:02 solardiz

That suggests the suppression is happening after PP

Does it suggest that? I think it does not: the hashcat-derived rule sets are large, and don't use PP.

Confirmed, you are right.

I'm very surprised if the suppression only happens before PP, and always did. That makes it so much less useful.

magnumripper avatar Feb 16 '22 17:02 magnumripper

[List.Rules:Test]
Az"[12]"
Az"[0-9]"
Az"[12]"
2022-02-16 18:43:28 0:00:00:00 - duplicate rule removed at line 117: Az"[12]"
2022-02-16 18:43:28 0:00:00:00 - 14 preprocessed word mangling rules were reduced by dropping 2 rules
2022-02-16 18:43:28 0:00:00:00 - 12 preprocessed word mangling rules

This is totally correct yet very misleading output.

magnumripper avatar Feb 16 '22 17:02 magnumripper

BTW, an issue we probably did not consider: same-looking line could be a different rule in native vs. hashcat mode because some rule command characters overlap and have different semantics. So the dupe suppression can potentially remove a rule that is not a dupe. I doubt this ever occurred so far, but technically our approach is incorrect. A workaround can be added.

solardiz avatar Feb 16 '22 17:02 solardiz

I saw the dupe suppression code taking that hashcat parameter in mind, so it might actually be OK

magnumripper avatar Feb 16 '22 17:02 magnumripper

Another caveat is that dupe suppression in batch mode has to be reset between single and wordlist. I see from old private conversations with Jim that he did take care of that as well.

magnumripper avatar Feb 16 '22 17:02 magnumripper

I see in the ancient private discussions, we had test cases like this:

[List.Rules:dupes]
-[:c] <* !?A \p1[lc] p
<* !?A \p1[lc] p
-c <* !?A \p1[lc] p

Jim's code did catch it (dropped the last line), and it still does. Perhaps the suppresion happens after rule reject flags but otherwise before PP. I need to look at the code instead.

Edit: Only the last line is dropped with current code. From the old mails I think the middle one was as well, back then.

magnumripper avatar Feb 16 '22 18:02 magnumripper

This one has no dupes removed. Probably never did

[List.Rules:Test]
Az"[12]"
Az"[0-9]"
Az"1"
Az"2"

magnumripper avatar Feb 16 '22 18:02 magnumripper

same-looking line could be a different rule in native vs. hashcat mode because some rule command characters overlap and have different semantics.

I saw the dupe suppression code taking that hashcat parameter in mind, so it might actually be OK

From dry-running the code, I think it only handles it correctly if the first line is a hashcat logic flag, so it seems to assume "all or none".

Edit: But then it uses rules_reject() to process, which does handle it, so I think the flag is maintained correctly. Then again, the actual dupe check doesn't care about that flag so you are probably correct.

magnumripper avatar Feb 17 '22 09:02 magnumripper

That pp line is indeed a dupe line in the config

Oh, probably a config file bug for us to fix, then? We could drop the dupe line, or we could first try and figure out why this happened - maybe it was supposed to differ in some way?

The dupe config line was added in 0402542a9 after you posted https://github.com/openwall/john/issues/3789#issue-431997416 with the dupe present.

magnumripper avatar Feb 17 '22 10:02 magnumripper

The dupe config line was added in 0402542 after you posted #3789 (comment) with the dupe present.

Looks like I arrived at this same rule in two different ways. We should simply drop the second copy.

solardiz avatar Feb 17 '22 10:02 solardiz

We get ~51~ 52 dupe rules (0.6%) with current --rules:jumbo. Using --rules:all there are ~63,566~ 63597 of them (0.8%). Using --rules-stack instead, the latter is 63,610, not sure why.

Edit: I was only counting lines logged as accepted as. There are also lines just saying accepted and dupes might be between those two variants. Fixed with a better one-liner.

magnumripper avatar Feb 18 '22 13:02 magnumripper