kaitai_struct_python_runtime
kaitai_struct_python_runtime copied to clipboard
Process XOR/ROL optimisations
Effectuates: https://github.com/kaitai-io/kaitai_struct/issues/390 https://github.com/kaitai-io/kaitai_struct/issues/397
I ran the test suite but please confirm it. Also I leave running the benchmarks to you, because I am not able to run those.
Benchmarks as requested:
>>> from timeit import *
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
12.807483946000048
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
13.325101311000026
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.117929144000072
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.115412523000032
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.979522146999898
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.68503412799987
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24798052300002
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24846534499989
All suggestions from @KOLANICH included and resolved. Ready to merge.
64.24798052300002
It's a shame that python doesn't have >>>
and <<<
operators since both rotations are implemented in hardware (more concrete: in both x86 and arm) since ancient times and we have to use these hacks. Could you create an issue in their bug tracker about it?
I would reach to python-ideas mailing list. The guys there have authoritative knowledge about such things, and they may say that adding those operators might not happen or was already rejected. I will ask them.
OTOH, that would not affect us. Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>>
and <<<
we could not use it.
Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.
I wonder if it is possible to extend python syntax with a native module. Or at least make some builtin functions without much overhead.
I added some helper code that would allow me to not maintain 2 code paths.
Finished implementing the rotation, including larger groups.
@KOLANICH Please stop right there. What you are doing is a pointless mathematical exercise, not proving anything. You are making huge implicit assumptions that arent justified, and overall doing it wrong.
For one, you are assuming that keys follow uniform distribution which I think is wrong. You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short. Thats also probably wrong. In my personal experience, programmers tend to use very short keys, few bytes at most. It seems unlikely to find kilobyte-sized keys in real formats, just due to how software deverlopers think and work. Its also somewhat likely that all-zero keys can be found in real usage. Just because the format includes a possibility of xoring some data, it doesnt mean that is actually and actively being used. Programmers often tend to use zero keys when they dont have a particular need to use a different one. A safe default, so to speak. Note that I am not presenting those conjectures to justify my code, but rather to disprove your argument. Your argument resembles this kind of thinking: we have some real-world numbers, and digits are taken from uniform distribution, then few pages of equations... until someone points out that digits in real statistical data follow Benfords law.
For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N. Segdwick, and I am sure you are familiar with this guy's work, points out in literally EVERY LECTURE that he gave, that using tilde notation would be more appropriate for that reason. That is, assuming it would be justified to use assymptotic notation in the first place.
The justification for the checks you talk about are following, and DO NOT depend on the conjectures I mentioned earlier:
- The check is limited to 64 bytes, so in absolute terms its cheap regardless of the key.
- If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
- If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is huge.
For one, you are assuming that keys follow uniform distribution which I think is wrong.
Of course it's wrong since xor is usually used not for crypto in file formats, but we can plug there any distribution we like by introducing an attribute giving probability of zero string. We may even specify simple distributions as expressions in KS language (of course it will require more math functions).
You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short.
I have no assumption on key length in the derivation, except the one it is a nonnegative number. The derivation should work for any length.
For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N.
I can use any function under O()
, remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N
and 2*N
because 2*N < N^2
for any N > 2
. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.
And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.
Half wrong. We wanna chose the implementation that is realistically faster, not the one with smaller theoretic asymptotic complexity.
I think you are completely missing the point, and just keep producing more equations.
I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2N because 2N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.
Listen, I like occasional math too. In fact I find this slightly nostalgic, because I do have a CS diploma and had this theory at classes too. Aside of the fact that we probably reside in different countries, I would not mind having some calculus just for fun, over some marshmallows at campfire. But this doesnt make your arguments valid. What you said in the quote is true, but it is also irrelevant.
I implore you to instead provide some counter argument for mine, if there is one:
- If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
- If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is significant.
Just added small check that groupsize is not negative.
I would prefer to remove the unused implementation before merging, but I left it so you can look at it.
Added optimisations from https://github.com/kaitai-io/kaitai_struct/issues/411 .
Please stop adding more and more stuff to a single PR. This highly elevates the chance that it won't be applied. And, no, it's not "optimization", it breaks functionality of reading files which are being appended to during parsing, like live binary logs, packet captures, etc.
I appologise for putting so much stuff into a single PR. Please consider this PR frozen at this point, and only related to Process XOR/ROL.
I advise reading it one commit at a time, and in split view.