obliv-c
obliv-c copied to clipboard
Tweak counters for Yao garbled gates could improve.
Jack Doerner brought up this issue in a related commit about OT counters: https://github.com/jackdoerner/obliv-c/commit/2f6e99467a08e3d34c7f80a7e0873b768f347702#commitcomment-24208666
TL;DR: currently each thread uses a counter sequence from a random start offset. While this is usually enough, there are pathological cases where counter values (64-bit) could collide, causing a security problem. We should find something better.
Code reference: the variable k in yaoHalfMask2: https://github.com/samee/obliv-c/blob/obliv-c/src/ext/oblivc/obliv_bits.c#L658
@jackdoerner
RE: Previous comment chain, for the record:
when you consider code that uses many threads over its lifetime, but each only for a short duration
That's just bad code, at least from a performance standpoint :) The standard solution to that (not just in secure programs) is to have a "thread pool" that is reused from time to time.
Yes, I certainly hope nothing I write spawns 2^32 threads. But as you observed there are also some bad security implications and I feel like we should do our best to handle those gracefully, just in case. It's worth noting that if you're not very careful, it's possible to write OpenMP code that spawns a huge number of threads over its lifetime. The correct programmer response to this is probably not to split the protocol afresh each time it happens though.
- Have each thread use its own ypd->fixedKeyCipher, each with its own random key that both parties know about. This is probably the simplest solution.
- Have a central thread distributing counter ranges, with larger ranges for the thread that keeps on using it up. This sounds needlessly complicated to me.
I agree with your assessment here. I'll start working.
My understanding of line 676 in obliv_bits.c:
for(i=0;i<sizeof(k);++i) for(j=0;j<2;++j) buf[i+j*blen]^=((k>>8*i)&0xff);
Is that you're filling a 128 bit buffer with two copies of the 64 bit gate count. The simple way would seem to me to fill the first 64 bits with a protocol count. This should be trivial to implement. Thoughts? Note that protocol counts will have to be allocated using atomics in order to avoid collisions.
Ah, no actually. This is an example of a micro-optimization that helped, but made the code harder to read.
Yes, it is filling up the buffer with two copies of a 64-bit counter. But the buffer is not 128 bits ... it's 2*FIXED_KEY_BLOCKLEN = 256 bits. The idea was to use a single invocation of gcry_cipher_encrypt for computing two hashes, because for some reason that was providing a sharp speedup. The trick did not carry over to larger number of hashes, for reasons that was never clear to me.
In this fixedKeyCipher has been configured in ECB mode, so there is no internal counter in AES here, we have eliminated it, and are maintaining the counter outside.
Ahh, thanks for the clarification. My logic should still hold, yes? I assume that the higher half of the counter is always 0 as currently coded? If so, we could put the protocol ID there.
Re: gcry_cipher_encrypt and its speedup for two blocks but not more: maybe it pipelines two blocks at a time, when possible? Intel recommends pipelining calls to the AESNI instructions, but when I read the specs, it was suggestion 4 to 8 blocks simultaneously.
It should work. How do you plan to do this? A new thread counter in ypd? And how are we allocating that counter, atomic globals? And please don't use up all the free bits for this. I am okay with capping thread counts at something reasonable, like a #define set to 256. We can always raise it higher if we ever find a use for it.
It should work. How do you plan to do this?
Pretty much just as you described, I think. A cap sounds reasonable, as long as we have a way to eject with an error if someone hits it. We could use atomic globals, but would it be better just to have the first instance of YPD initialize a thread counter, and increment it with each instance split from that one or its children? I guess the real question is, are there security implications to ending a yao protocol and starting another (for the sake of argument, we'll say you do this 2^32 times)? If yes, I guess globals make sense.
Here is a preliminary version. Let me know if you have thoughts.