kaitai_struct_python_runtime icon indicating copy to clipboard operation
kaitai_struct_python_runtime copied to clipboard

Process XOR/ROL optimisations

Open arekbulski opened this issue 6 years ago • 16 comments

Effectuates: https://github.com/kaitai-io/kaitai_struct/issues/390 https://github.com/kaitai-io/kaitai_struct/issues/397

I ran the test suite but please confirm it. Also I leave running the benchmarks to you, because I am not able to run those.

arekbulski avatar Apr 01 '18 15:04 arekbulski

Benchmarks as requested:

>>> from timeit import *
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
12.807483946000048
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
13.325101311000026
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.117929144000072
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.115412523000032
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.979522146999898
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.68503412799987
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24798052300002
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24846534499989

arekbulski avatar Apr 02 '18 20:04 arekbulski

All suggestions from @KOLANICH included and resolved. Ready to merge.

arekbulski avatar Apr 02 '18 20:04 arekbulski

64.24798052300002

It's a shame that python doesn't have >>> and <<< operators since both rotations are implemented in hardware (more concrete: in both x86 and arm) since ancient times and we have to use these hacks. Could you create an issue in their bug tracker about it?

KOLANICH avatar Apr 02 '18 20:04 KOLANICH

I would reach to python-ideas mailing list. The guys there have authoritative knowledge about such things, and they may say that adding those operators might not happen or was already rejected. I will ask them.

OTOH, that would not affect us. Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.

arekbulski avatar Apr 02 '18 20:04 arekbulski

Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.

I wonder if it is possible to extend python syntax with a native module. Or at least make some builtin functions without much overhead.

KOLANICH avatar Apr 02 '18 21:04 KOLANICH

I added some helper code that would allow me to not maintain 2 code paths.

arekbulski avatar Apr 04 '18 10:04 arekbulski

Finished implementing the rotation, including larger groups.

arekbulski avatar Apr 04 '18 21:04 arekbulski

@KOLANICH Please stop right there. What you are doing is a pointless mathematical exercise, not proving anything. You are making huge implicit assumptions that arent justified, and overall doing it wrong.

For one, you are assuming that keys follow uniform distribution which I think is wrong. You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short. Thats also probably wrong. In my personal experience, programmers tend to use very short keys, few bytes at most. It seems unlikely to find kilobyte-sized keys in real formats, just due to how software deverlopers think and work. Its also somewhat likely that all-zero keys can be found in real usage. Just because the format includes a possibility of xoring some data, it doesnt mean that is actually and actively being used. Programmers often tend to use zero keys when they dont have a particular need to use a different one. A safe default, so to speak. Note that I am not presenting those conjectures to justify my code, but rather to disprove your argument. Your argument resembles this kind of thinking: we have some real-world numbers, and digits are taken from uniform distribution, then few pages of equations... until someone points out that digits in real statistical data follow Benfords law.

For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N. Segdwick, and I am sure you are familiar with this guy's work, points out in literally EVERY LECTURE that he gave, that using tilde notation would be more appropriate for that reason. That is, assuming it would be justified to use assymptotic notation in the first place.

The justification for the checks you talk about are following, and DO NOT depend on the conjectures I mentioned earlier:

  • The check is limited to 64 bytes, so in absolute terms its cheap regardless of the key.
  • If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
  • If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is huge.

arekbulski avatar Apr 05 '18 12:04 arekbulski

For one, you are assuming that keys follow uniform distribution which I think is wrong.

Of course it's wrong since xor is usually used not for crypto in file formats, but we can plug there any distribution we like by introducing an attribute giving probability of zero string. We may even specify simple distributions as expressions in KS language (of course it will require more math functions).

You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short.

I have no assumption on key length in the derivation, except the one it is a nonnegative number. The derivation should work for any length.

For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N.

I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2*N because 2*N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

KOLANICH avatar Apr 05 '18 13:04 KOLANICH

And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

Half wrong. We wanna chose the implementation that is realistically faster, not the one with smaller theoretic asymptotic complexity.

I think you are completely missing the point, and just keep producing more equations.

arekbulski avatar Apr 05 '18 14:04 arekbulski

I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2N because 2N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

Listen, I like occasional math too. In fact I find this slightly nostalgic, because I do have a CS diploma and had this theory at classes too. Aside of the fact that we probably reside in different countries, I would not mind having some calculus just for fun, over some marshmallows at campfire. But this doesnt make your arguments valid. What you said in the quote is true, but it is also irrelevant.

I implore you to instead provide some counter argument for mine, if there is one:

  • If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
  • If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is significant.

arekbulski avatar Apr 05 '18 14:04 arekbulski

Just added small check that groupsize is not negative.

arekbulski avatar Apr 05 '18 15:04 arekbulski

I would prefer to remove the unused implementation before merging, but I left it so you can look at it.

arekbulski avatar Apr 05 '18 16:04 arekbulski

Added optimisations from https://github.com/kaitai-io/kaitai_struct/issues/411 .

arekbulski avatar Apr 06 '18 14:04 arekbulski

Please stop adding more and more stuff to a single PR. This highly elevates the chance that it won't be applied. And, no, it's not "optimization", it breaks functionality of reading files which are being appended to during parsing, like live binary logs, packet captures, etc.

GreyCat avatar Apr 06 '18 14:04 GreyCat

I appologise for putting so much stuff into a single PR. Please consider this PR frozen at this point, and only related to Process XOR/ROL.

I advise reading it one commit at a time, and in split view.

arekbulski avatar Apr 06 '18 15:04 arekbulski