secretflow question about TEEU architecture

I studied the following demo in TEEU documentation:

import numpy as np

def average(data):
    return np.average(data, axis=1)

alice = sf.PYU('alice')
bob = sf.PYU('bob')

from secretflow.device import TEEU

# mrenclave can be omitted in simulation mode.
teeu = TEEU('carol', mr_enclave='')

a = alice(lambda: np.random.rand(4, 3))()
b = bob(lambda: np.random.rand(4, 3))()

# Transfer data to teeu.
a_teeu = a.to(teeu, allow_funcs=average)
b_teeu = b.to(teeu, allow_funcs=average)

# TEEU runs average.
avg_val = teeu(average)([a_teeu, b_teeu])
print(sf.reveal(avg_val))

a = sf.reveal(a)
b = sf.reveal(b)
np.testing.assert_equal(avg_val, average([a, b]) )

My question is that when alice and bob said that average is allow_funcs, there is a mechanism to prove that average is the function defined here, right? In this demo, the function is defined locally. But in a real-world application, how can alice and bob ensure the functions allowed are safe functions? Does the authorization manager (carol) make this promise? What if the project is barely between alice and bob (which is true in most cases) and there is no carol at all?

Jul 17 '23 13:07 lidh15

@zhouaihui could you please have a look?

Jul 18 '23 02:07 6fj

From other tutorials, I found that users believe there is a copy of the codes in each party: every one of Alice, Bob, and Carol defines the same average in their local codebase. If this is the case, my question is what if the code is close sourced? In many cases, one of Alice and Bob, say, Alice, is the data holder, while the other is the codebase holder. On the one hand, Alice won't let Bob see his data, on the other hand, Bob won't let Alice see his codebase, and there is no Carol. How should we utilize TEEU then?

Jul 18 '23 02:07 lidh15

I studied the following demo in TEEU documentation:
import numpy as np

def average(data):
    return np.average(data, axis=1)

alice = sf.PYU('alice')
bob = sf.PYU('bob')

from secretflow.device import TEEU

# mrenclave can be omitted in simulation mode.
teeu = TEEU('carol', mr_enclave='')

a = alice(lambda: np.random.rand(4, 3))()
b = bob(lambda: np.random.rand(4, 3))()

# Transfer data to teeu.
a_teeu = a.to(teeu, allow_funcs=average)
b_teeu = b.to(teeu, allow_funcs=average)

# TEEU runs average.
avg_val = teeu(average)([a_teeu, b_teeu])
print(sf.reveal(avg_val))

a = sf.reveal(a)
b = sf.reveal(b)
np.testing.assert_equal(avg_val, average([a, b]) )
My question is that when alice and bob said that average is allow_funcs, there is a mechanism to prove that average is the function defined here, right? In this demo, the function is defined locally. But in a real-world application, how can alice and bob ensure the functions allowed are safe functions? Does the authorization manager (carol) make this promise? What if the project is barely between alice and bob (which is true in most cases) and there is no carol at all?

Now that secretflow suggests using multi-controller mode in production usage, the function you allowed is defined locally. This means the functions executed in TEE are limited by you locally.

how can alice and bob ensure the functions allowed are safe functions

In practice, whether the functions are safe or not depends on your algorithm design (You need to decide what can be output/revealed, just as what you do in designing which part should be protected by holomorphic encryption in federated learning.) TEEU only provides a mechanism that you could dynamically extend what can be executed and all this tradeoff must be agreed by all parties. Then all the details such as remote attestations and dynamic behavior will be handled automatically by the TEEU framework.

What if the project is barely between alice and bob (which is true in most cases) and there is no carol at all?

For the second question, TEE does not really care where it is located. As the TEE has a remote attestation mechanism, the data provider does not have to trust humans(in this demo, Carol). It is ok to just put the TEE machine in Alice or Bob.

Jul 18 '23 03:07 icavan

It depends.

If the data providers have to perform remote attestation to check how my data will be used, then close source codes do not make sense here. You always have to open-source your codes to the data providers for them to verify your mrenclave, which implies code hash.

However, there are some cases that close source works. Say Alice provides the input, whereas Bob provides the functions, and Bob wants to protect the function. Say the function is a SQL statement, it is ok for Bob to hide the SQL query in certain conditions.

The SQL output will only be opened to Alice, say all the SQL outputs will be encrypted by Alice's key. And this part of the mechanism can be checked by remote attestation.
Bob could dynamically execute his private(close source) SQL queries. Without having access to the output. he cannot perform valid attacks.

This kind of feature will be supported in future TEE releases. Let me know if this is an important scenario.

Jul 18 '23 03:07 icavan

However, there are some cases that close source works. Say Alice provides the input, whereas Bob provides the functions, and Bob wants to protect the function. Say the function is a SQL statement, it is ok for Bob to hide the SQL query in certain conditions.

The SQL output will only be opened to Alice, say all the SQL outputs will be encrypted by Alice's key. And this part of the mechanism can be checked by remote attestation.

Bob could dynamically execute his private(close source) SQL queries. Without having access to the output. he cannot perform valid attacks.

yes, I think this is more close to our scenario than the demo, I'm looking forward to it. In fact, I know some similar solutions in Gramine's PPML tutorials, I wonder if there is another implementation based on TEEU. After all, Gramine + Intel SGX may not be accepted in the future due to reasons you know why.

Jul 18 '23 06:07 lidh15

yes, I think this is more close to our scenario than the demo, I'm looking forward to it.

Thanks for your suggestion, we will take it into consideration for future support.

After all, Gramine + Intel SGX may not be accepted in the future due to reasons you know why.

Yea, we are attempting to run our codes on Hygon CSV. Stay tuned!

Jul 18 '23 09:07 zhouaihui

secretflow secretflow copied to clipboard

question about TEEU architecture

secretflow
secretflow copied to clipboard