john Rule for finding last occurrence of character/class

A trivial example is "bump last digit": With current code, this can be done with eg. r /?d +p r but that is a pretty wasteful operation.

The %NX (or %N?C) command can currently be used with N=0 ~but it behaves exactly like N=1~ (EDIT not true, see below). This experimental patch changes N=0 to find the last occurence of a character or class:

diff --git a/src/rules.c b/src/rules.c
index a80c99e03..2f30e4af3 100644
--- a/src/rules.c
+++ b/src/rules.c
@@ -1177,12 +1177,12 @@ char *rules_apply(char *word_in, char *rule, int split, char *last)
 
 		case '%':
 			{
-				int count = 0, required, pos;
+				int count = 0, last = -1, required, pos;
 				POSITION(required)
 				CLASS_export_pos(0,
-				    if (++count >= required) break, {})
-				if (count < required) REJECT
-				rules_vars['p'] = pos;
+				    last = pos; if (++count >= required && required) break, {})
+				if (!count || count < required) REJECT
+				rules_vars['p'] = (last >= 0 ? last : pos);
 			}
 			break;

It will still reject if no occurence of the character/class was found. While the above patch doesn't seem to slow down normal use of %N, it appears to be 7-8% faster than that reverse-search-reverse rule, using (also simpler to write) %0?d +p rule.

Can this be achieved with current functionality using some other, better trick? Or is this patch something we'd consider to merge (along with documentation)?

Jul 19 '22 09:07 magnumripper

Alternatively, you could implement counting from the end if the number is negative, e.g., %-1?d meaning the last digit. But this might be overkill, because nobody is ever going to use this.

Jul 19 '22 09:07 frank-dittrich

Alternatively, you could implement counting from the end if the number is negative, e.g., %-1?d meaning the last digit. But this might be overkill, because nobody is ever going to use this.

I'm not sure the current code will manage negative numbers but we could make it do that. Your idea has some merit: I have rulesets similar to this for replacing any digits in a word with other digits:

# Replace any digits in word. Support for over 5 digits ends up in
# millions of rules after expansion so gets very slow
[List.Rules:ReplaceDigits]
/?d op[0-9]
%2?d op[0-9] /?d op[0-9]
%3?d op[0-9] %2?d op[0-9] /?d op[0-9]
%4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]
%5?d op[0-9] %4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]
#%6?d op[0-9] %5?d op[0-9] %4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]
#%7?d op[0-9] %6?d op[0-9] %5?d op[0-9] %4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]
#%8?d op[0-9] %7?d op[0-9] %6?d op[0-9] %5?d op[0-9] %4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]

As it says, it becomes too large/slow for more than 5 or so digits. But with your suggestion we could have reduced cases that support eg. up to 4 first or last digits, something like this:

# Replace any digits in word. Supports up to four digits, then starts
# only replacing the four first or four last ones
[List.Rules:ReplaceDigits]
/?d op[0-9]
%2?d op[0-9] /?d op[0-9]
%3?d op[0-9] %2?d op[0-9] /?d op[0-9]
%4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]
%5?d %-1?d op[0-9] %-2?d op[0-9] %-3?d op[0-9] %-4?d op[0-9]

That will end up as 21110 rules. Full support up to 8 digits (uncommenting the last three lines in first example) ends up as 111 million rules - pretty much unusable...

Then again, for this particular example a hybrid external mode could be way more effective.

Jul 19 '22 09:07 magnumripper

Besides, we can do that today with the reverse-reverse trick, which become less wasteful with more operations between. This should do the same as the last example above - with current code:

# Replace any digits in word. Supports up to four digits, then starts
# only replacing the four first or four last ones
[List.Rules:ReplaceDigits]
/?d op[0-9]
%2?d op[0-9] /?d op[0-9]
%3?d op[0-9] %2?d op[0-9] /?d op[0-9]
%4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9]
%5?d r %4?d op[0-9] %3?d op[0-9] %2?d op[0-9] /?d op[0-9] r

Jul 19 '22 10:07 magnumripper

Originally posted by @solardiz in https://github.com/openwall/john/pull/5180#pullrequestreview-1043300643

An interesting aspect is what happens to this command's original behavior - rejection on too few matches. Previously, 0 would mean to never reject, right?

I saw no difference at all from %1 in my initial testing of it but I got it wrong and you are right. Our current rules neither use 0, any variable nor any pp macro though.

Could we ever want the old behavior? It would set p to the position of first digit or to end of word if no digit found. Say we have %0?d ipx:

john   -> johnx
jo3hn  -> jox3hn
jo3hn4 -> jox3hn4

I can't think of any obvious use of that but I think I'll ponder it for a while. We could go with Frank's suggestion, or pick a new mnemonic for this one or even just drop this idea - it's doable anyway with the reverse/reverse trick.

Jul 19 '22 18:07 magnumripper

Could we ever want the old behavior? It would set p to the position of first digit or to end of word if no digit found. Say we have %0?d ipx:
john   -> johnx
jo3hn  -> jox3hn
jo3hn4 -> jox3hn4

Another variant is %0: op-:

john            -> john
john:the        -> john-the
john:the:ripper -> john-the:ripper

This time it doesn't alter the input word where there's no match: We can't overwrite the end of the word but we can insert before it.

Jul 19 '22 19:07 magnumripper

@magnumripper I don't currently expect to have further comments on this. You decide. Thanks.

Jul 21 '22 15:07 solardiz

I'll close the PR and leave this issue open for future considerations

Jul 21 '22 18:07 magnumripper

There is also an unrelated documentation fix in that PR.

Jul 21 '22 18:07 solardiz

I have other pending typo fixes, will do a commit for them

Jul 21 '22 18:07 magnumripper

I've come to the conclusion the reverse-reverse trick is good enough, especially given the moderate speedup my patch gave. So closing.

Oct 31 '22 00:10 magnumripper