pcfg_cracker
pcfg_cracker copied to clipboard
Support gzip input in trainer.py
Hi,
with this patch is possible using gzipped training file.
POC
$ python3 trainer.py -t ~/works/wl3/EnciclopediaItaliana.txt.gz -r test
____ __ __ ______ __
/ __ \________ / /_/ /___ __ / ____/___ ____ / /
/ /_/ / ___/ _ \/ __/ __/ / / / / / / __ \/ __ \/ /
/ ____/ / / __/ /_/ /_/ /_/ / / /___/ /_/ / /_/ / /
/_/ __/_/_ \___/\__/\__/\__, / \__________/\____/_/
/ ____/_ __________ __/_/_ / ____/_ _____ _____________ _____
/ /_ / / / /_ /_ / / / / / / / __/ / / / _ \/ ___/ ___/ _ \/ ___/
/ __/ / /_/ / / /_/ /_/ /_/ / / /_/ / /_/ / __(__ |__ ) __/ /
/_/____/__,_/ /___//__/\__, / \____/\__,_/\___/____/____/\___/_/
/_ __/________ _(_)___ /_/_ _____
/ / / ___/ __ `/ / __ \/ _ \/ ___/
/ / / / / /_/ / / / / / __/ /
/_/ /_/ \__,_/_/_/ /_/\___/_/
Version: 4.7
-----------------------------------------------------------------
Attempting to autodetect file encoding of the training passwords
-----------------------------------------------------------------
File Encoding Detected: utf-8
Confidence for file encoding: 0.99
If you think another file encoding might have been used please
manually specify the file encoding and run the training program again
-------------------------------------------------
Performing the first pass on the training passwords
What we are learning:
A) Identify words for use in multiword detection
B) Identify alphabet for Markov chains
C) Duplicate password detection, (duplicates are good!)
-------------------------------------------------
Printing out status after every million passwords parsed
------------
Number of Valid Passwords: 281923
Number of Encoding Errors Found in Training Set: 0
WARNING:
No duplicate passwords were detected in the first 100000 parsed passwords
This may be a problem since the training program needs to know frequency
info such as '123456' being more common than '629811'
-------------------------------------------------
Performing the second pass on the training passwords
What we are learning:
A) Learning Markov (OMEN) NGRAMS
B) Training the core PCFG grammar
-------------------------------------------------
Printing out status after every million passwords parsed
------------
-------------------------------------------------
Calculating Markov (OMEN) probabilities and keyspace
This may take a few minutes
-------------------------------------------------
OMEN Keyspace for Level : 1 : 120
OMEN Keyspace for Level : 2 : 1372
OMEN Keyspace for Level : 3 : 8760
OMEN Keyspace for Level : 4 : 42617
OMEN Keyspace for Level : 5 : 179047
OMEN Keyspace for Level : 6 : 653564
OMEN Keyspace for Level : 7 : 2109901
OMEN Keyspace for Level : 8 : 6062042
OMEN Keyspace for Level : 9 : 16246605
OMEN Keyspace for Level : 10 : 41594143
OMEN Keyspace for Level : 11 : 103770350
OMEN Keyspace for Level : 12 : 260785409
OMEN Keyspace for Level : 13 : 711119473
OMEN Keyspace for Level : 14 : 2486041579
-------------------------------------------------
Performing third pass on the training passwords
What we are learning:
A) What Markov (OMEN) probabilities the training passwords would be created at
-------------------------------------------------
-------------------------------------------------
Top 5 e-mail providers
-------------------------------------------------
-------------------------------------------------
Top 5 URL domains
-------------------------------------------------
;7"“'i/"lg.no : 1
ciicnrbit.it : 1
crora.no : 1
crsmologia.ca : 1
(ctir.ru : 1
-------------------------------------------------
Top 10 Years found
-------------------------------------------------
1900 : 1
1908 : 1
1911 : 1
1916 : 1
1943 : 1
1947 : 1
1969 : 1
1970 : 1
1974 : 1
1975 : 1
-------------------------------------------------
Saving Data
-------------------------------------------------
PW Length 1 : (10, 0)
PW Length 2 : (11, 0)
PW Length 3 : (12, 0)
PW Length 4 : (5, 15635)
PW Length 5 : (6, 26304)
PW Length 6 : (7, 33386)
PW Length 7 : (7, 38508)
PW Length 8 : (8, 39048)
PW Length 9 : (10, 34992)
PW Length 10 : (11, 28946)
PW Length 11 : (12, 21190)
PW Length 12 : (13, 13692)
PW Length 13 : (15, 8181)
PW Length 14 : (17, 4731)
PW Length 15 : (18, 2642)
PW Length 16 : (20, 1473)
PW Length 17 : (21, 730)
PW Length 18 : (23, 439)
PW Length 19 : (25, 233)
PW Length 20 : (26, 165)
PW Length 21 : (27, 133)
$ ../compiled-pcfg-matrix/pcfg_guesser -r Rules/test
____ __ __ ______ __
/ __ \________ / /_/ /___ __ / ____/___ ____ / /
/ /_/ / ___/ _ \/ __/ __/ / / / / / / __ \/ __ \/ /
/ ____/ / / __/ /_/ /_/ /_/ / / /___/ /_/ / /_/ / /
/_/ __/_/_ \___/\__/\__/\__, / \__________/\____/_/
/ ____/_ __________ __/_/_ / ____/_ _____ _____________ _____
/ /_ / / / /_ /_ / / / / / / / __/ / / / _ \/ ___/ ___/ _ \/ ___/
/ __/ / /_/ / / /_/ /_/ /_/ / / /_/ / /_/ / __(__ |__ ) __/ /
/_/ /__,_/ /___//__/\__, / \____/\__,_/\___/____/____/\___/_/
/_/
---------------------------> PURE C EDITION!!!
Version: 4.1
Loading Ruleset:Rules/test/
Initailizing the Priority Queue
Starting to generate guesses
dell
gt
mente
lt
zione
gt,
all
vano
nell
quest
dell,
gt.
dall
zioni
rono
lt,
mento
de
dell.
menti
deu
acqua
gt-
altra
lt.
all,
altro
amp
opera
mente,
deir
coll
azione
l,
Dell
de,
quell
zione,
dell-