hashkill
hashkill copied to clipboard
zip2.0 cracking dependency on payload length
hi, just wondering if performance should be expected to be poor when attempting to crack a zipcrypted single-file archive where the inner file is very long (37gb).
does hashkill's zip module do the full naive decrypt, inflate, crc32 or is there some heuristic distinguisher of valid deflate streams that can abort early on incorrect guesses?
how hard might this functionality be to implement if it does not yet exist?
further, how difficult would it be to modify hashkill for distributed cracking with a variable workforce, coordinated over a website or something? at a guess, you could do some kind of dynamic reallocation using the json statefiles and some extra helper process to consolidate completed portions of the guess space.
best regards, -nsh
Short answer: it depends.
Long answer: you may have noticed that with password-protected zip2.0, when providing wrong password most of the times it immediately gives out "wrong password", but sometimes it starts decompressing the files and returns bad CRC or decompression error later on.
Technically speaking, there is a single byte (or two bytes) verifier value in the file header that can be used as a quick check. It gives out 1/256 (or 1/65536) probability of hitting a false positive, e.g wrong password passing the check. So, if we have passed that verifier check, we need to do a full decryption and decompression process and if it is successful, we compare the CRC values from the header and the calculated one. With big files that may take a lot of time.
hashkill may take advantage of archives with multiple files though as each file within the archive has a separate verifier value that we may try and if we have e.g four files with four separate single byte verifier values, then probability of false positive becomes 1/4 billion thus reducing the number of full file decompression/decryption attempts a lot and speeding up the cracking a lot.
However, this is not possible in cases where we have a single huge file in the archive. There are some optimizations to the decompression algorithm that try to rule out bad data early in the decompression process, but it offers limited performance benefits and AFAIK is currently only available when cracking on GPUs.
To make things worse, very small files are not good too as there is still some minor chance that they decompress successfully and give out the correct CRC value, but the password is still incorrect. Thus, hashkill refuses to operate on files < 100 bytes in archive, so even having two files there, one very big, and one very small, wouldn't help.
So there is little that can be done for the case with single big file in archive.
hashkill still doesn't handle multi-file archives the best way though, so improvements can be done for multi-file archives. For example hashkill will always try to decrypt/decompress the first file in the archive, doesn't matter how many files are there and the first file may be a huge one with smaller ones available that would speed up the cracking process if used instead. There are other attacks that could speed up the cracking process a lot, e.g having a separate copy of a file that is known to be in the encrypted archive may be exploited to improve speeds a lot, but hashkill is not taking advantage of that at present.
As far as distributed attacks are concerned, in the GPU case you could go after virtualcl, it should theoretically work well for bruteforce and markov attacks and not that well for rule-based attacks.
When cracking on CPUs there is no easy way to do that, the only way being to manually split the keyspace per each cracking instance. The statfiles do contain information about the portion of the keyspace exhausted but there is currently no way to remotely command a hashkill instance. It was planned, but I got distracted with other stuff and I can't tell you when (or whether) that would be implemented.
thanks for a timely and comprehensive answer! i'm going to keep grinding through the literature to see if it's possible to implement a divide-and-conquer with absolutely spartan known-plaintext that might be guessable from how the DEFLATE implementation operates and knowledge of the payload file format (which is very undocumented acronis true-image file, but i might be able to reverse-engineer the vmware converter software which can turn them into vmware virtual disks.) for context and motivation: see https://netzpolitik.org/2014/gamma-finfisher-hacked-40-gb-of-internal-documents-and-source-code-of-government-malware-published/
these are very bad people and 37gb of their govt-malware source-code may save lives in authoritarian regimes and generally improve the health of the internet, humanity.
thanks again for your input.