tlsn
tlsn copied to clipboard
Oblivious Transfer Slice API
With OT extension + Beaver derandomization we are able to do one big setup for all the OTs we require during the execution of the protocol. Because there will be concurrent execution of various sub-protocols we need a way to partition this OT setup amongst them.
We would like to be able to use something akin to a slice API. For example, say we have 3 sub-protocols which require $N_1$, $N_2$, $N_3$ OTs to execute, where $N = N_1 + N_2 + N_3$
// Receiver
let stream = Stream::new();
let ot = Receiver::new(stream.open_substream());
// Sends setup over substream
ot.setup(N);
let ot_1: Receiver = ot.slice(N_1, stream.open_substream());
let ot_2: Receiver = ot.slice(N_2, stream.open_substream());
let ot_3: Receiver = ot.slice(N_3, stream.open_substream());
// Sender
let stream = Stream::new();
let ot = Sender::new(stream.next_substream());
// Receives setup over substream
ot.setup();
let ot_1: Sender = ot.slice(N_1, stream.next_substream());
let ot_2: Sender = ot.slice(N_2, stream.next_substream());
let ot_3: Sender = ot.slice(N_3, stream.next_substream());
Now we have 3 pairs setup each owning a substream between them and can be executed entirely concurrently. This API assumes a static setup procedure known by both the Sender and Receiver, which for our purposes at the moment should be an acceptable assumption.
What if you do it such that there's a BatchOt
object that takes mut refs to a bunch of prepared (but unexecuted) Ot
objects. That way setup is almost identical (only difference is you pass to the batching before execution), and use of the OT values is entirely identical (because everything just depends on wire values from the Ot
. Does that make any sense?
I'm not sure I follow, can you provide pseudo-code for your suggestion? Are you talking something like a setup factory for ots?
let stream = Stream::new();
let ot_1 = Receiver::new(N_1, stream.open_substream());
let ot_2 = Receiver::new(N_2, stream.open_substream());
let ot_3 = Receiver::new(N_3, stream.open_substream());
BatchOt::setup(&[&mut ot_1, &mut ot_2, &mut ot_3], &mut stream).unwrap();
This doesn't feel very ergonomic
Something like that, though I agree it's clunky. My observation was that N
is not going to be known in advance. Rather, a bunch of distinct MPC protocols are going to set up their own OT (of size N_i
) without knowing about the other OTs going on. There should be some sort of builder that collects these setups, computes N
, and respects the substream separations.
Do we really need to compute, or know N
in advance? I thought OT extension is so cheap that the number does not really matter. What about having a setup where different protocols can request/consume a number of OTs and if this is more than is available an error is returned or more OTs are produced. With concurrency in mind it could be easier to just produce and distribute them instead of collecting the exact number before?
I thought OT extension is so cheap that the number does not really matter Yes, this is correct, it is cheap indeed.
I'm gonna adress specifically the for multi-round TLS: ~we're not gonna know in advance the size of the server response, so we will not be able to correctly pre-compute with precision the exact amount of OTs needed~. EDIT: sorry I remembered that during server response decryption in GC, we're gonna be doing privacy-free GC, where the Notary is the garbler and it will be the Notary who will be inputting the ciphertext into the GC, thus there is 0 OT cost involved.
Specifically because extended OT is so cheap, I suggest we favor simplicity of the interface over any perceived performance gains, imo.
I will back either approach (pre-computing extra OTs or running a fresh base OT when we run out of OTs) as long as it has a simple API.
P.S. Maybe someone could run benchmarks of the compute cost of computing extra OTs vs running a fresh base OT setup. I feel that computing millions extra OTs will always beat a fresh base OT setup.
But even with the benchmarks, we have a problem of using an un-optimized matrix transpose function in wasm https://github.com/tlsnotary/tlsn/issues/4 on which the performance of computing extra OTs hinges.
Even if native benchmarks show one thing, it may not be the same for wasm.
It would be worthwhile to estimate the order of magnitude of $N$, but I'm leaning towards just provisioning extra OTs as suggested above.
Quick benchmarks on my machine: 100K random OT takes 0.4s to setup, 200K takes roughly double so the cost is linear right now. This is deceiving because we're using a painfully naive matrix transpose function as pointed out by Dan. The real cost of multiple setups after we fix this will be the network cost where latency will dominate.
Most of the implementation complexity here is the multiplexing, ie making sure both ends of the pipe are hooked up in the right place.
Do you think it would make sense to improve the transpose implementation (probably implementing #4) to get a more accurate benchmark?
Do you think it would make sense to improve the transpose implementation (probably implementing https://github.com/tlsnotary/tlsn/issues/4) to get a more accurate benchmark?
Yes, it makes sense. I'll @ you in that issue.
@sinui0 ,
100K random OT takes 0.4s to setup
did you time just extOT or is it for base OT + ext OT ?
did you time just extOT or is it for base OT + ext OT ?
@themighty1 base OT + ext OT. The base OT setup time will be constant 128 OT for extension
Closing, as this was implemented in #73