secretflow
secretflow copied to clipboard
question about TEEU architecture
I studied the following demo in TEEU documentation:
import numpy as np
def average(data):
return np.average(data, axis=1)
alice = sf.PYU('alice')
bob = sf.PYU('bob')
from secretflow.device import TEEU
# mrenclave can be omitted in simulation mode.
teeu = TEEU('carol', mr_enclave='')
a = alice(lambda: np.random.rand(4, 3))()
b = bob(lambda: np.random.rand(4, 3))()
# Transfer data to teeu.
a_teeu = a.to(teeu, allow_funcs=average)
b_teeu = b.to(teeu, allow_funcs=average)
# TEEU runs average.
avg_val = teeu(average)([a_teeu, b_teeu])
print(sf.reveal(avg_val))
a = sf.reveal(a)
b = sf.reveal(b)
np.testing.assert_equal(avg_val, average([a, b]) )
My question is that when alice and bob said that average
is allow_funcs
, there is a mechanism to prove that average
is the function defined here, right?
In this demo, the function is defined locally. But in a real-world application, how can alice and bob ensure the functions allowed are safe functions? Does the authorization manager (carol) make this promise?
What if the project is barely between alice and bob (which is true in most cases) and there is no carol at all?
@zhouaihui could you please have a look?
From other tutorials, I found that users believe there is a copy of the codes in each party: every one of Alice, Bob, and Carol defines the same average
in their local codebase.
If this is the case, my question is what if the code is close sourced?
In many cases, one of Alice and Bob, say, Alice, is the data holder, while the other is the codebase holder. On the one hand, Alice won't let Bob see his data, on the other hand, Bob won't let Alice see his codebase, and there is no Carol. How should we utilize TEEU then?
I studied the following demo in TEEU documentation:
import numpy as np def average(data): return np.average(data, axis=1) alice = sf.PYU('alice') bob = sf.PYU('bob') from secretflow.device import TEEU # mrenclave can be omitted in simulation mode. teeu = TEEU('carol', mr_enclave='') a = alice(lambda: np.random.rand(4, 3))() b = bob(lambda: np.random.rand(4, 3))() # Transfer data to teeu. a_teeu = a.to(teeu, allow_funcs=average) b_teeu = b.to(teeu, allow_funcs=average) # TEEU runs average. avg_val = teeu(average)([a_teeu, b_teeu]) print(sf.reveal(avg_val)) a = sf.reveal(a) b = sf.reveal(b) np.testing.assert_equal(avg_val, average([a, b]) )
My question is that when alice and bob said that
average
isallow_funcs
, there is a mechanism to prove thataverage
is the function defined here, right? In this demo, the function is defined locally. But in a real-world application, how can alice and bob ensure the functions allowed are safe functions? Does the authorization manager (carol) make this promise? What if the project is barely between alice and bob (which is true in most cases) and there is no carol at all?
Now that secretflow suggests using multi-controller mode in production usage, the function you allowed
is defined locally. This means the functions executed in TEE are limited by you
locally.
how can alice and bob ensure the functions allowed are safe functions
In practice, whether the functions are safe or not depends on your algorithm design (You need to decide what can be output/revealed, just as what you do in designing which part should be protected by holomorphic encryption in federated learning.) TEEU only provides a mechanism that you could dynamically extend what can be executed and all this tradeoff must be agreed by all parties. Then all the details such as remote attestations and dynamic behavior will be handled automatically by the TEEU framework.
What if the project is barely between alice and bob (which is true in most cases) and there is no carol at all?
For the second question, TEE does not really care where it is located. As the TEE has a remote attestation mechanism, the data provider does not have to trust humans(in this demo, Carol). It is ok to just put the TEE machine in Alice or Bob.
It depends.
If the data providers have to perform remote attestation to check how my data will be used, then close source codes do not make sense here. You always have to open-source your codes to the data providers for them to verify your mrenclave, which implies code hash.
However, there are some cases that close source works. Say Alice provides the input, whereas Bob provides the functions, and Bob wants to protect the function. Say the function is a SQL statement, it is ok for Bob to hide the SQL query in certain conditions.
- The SQL output will only be opened to Alice, say all the SQL outputs will be encrypted by Alice's key. And this part of the mechanism can be checked by remote attestation.
- Bob could dynamically execute his private(close source) SQL queries. Without having access to the output. he cannot perform valid attacks.
This kind of feature will be supported in future TEE releases. Let me know if this is an important scenario.
However, there are some cases that close source works. Say Alice provides the input, whereas Bob provides the functions, and Bob wants to protect the function. Say the function is a SQL statement, it is ok for Bob to hide the SQL query in certain conditions.
- The SQL output will only be opened to Alice, say all the SQL outputs will be encrypted by Alice's key. And this part of the mechanism can be checked by remote attestation.
- Bob could dynamically execute his private(close source) SQL queries. Without having access to the output. he cannot perform valid attacks.
yes, I think this is more close to our scenario than the demo, I'm looking forward to it. In fact, I know some similar solutions in Gramine's PPML tutorials, I wonder if there is another implementation based on TEEU. After all, Gramine + Intel SGX may not be accepted in the future due to reasons you know why.
yes, I think this is more close to our scenario than the demo, I'm looking forward to it.
Thanks for your suggestion, we will take it into consideration for future support.
After all, Gramine + Intel SGX may not be accepted in the future due to reasons you know why.
Yea, we are attempting to run our codes on Hygon CSV. Stay tuned!