federated icon indicating copy to clipboard operation
federated copied to clipboard

create_tf_dataset_from_all_clients() function produces one-labeled dataset with only one file

Open LamaCian opened this issue 2 years ago • 3 comments

I am trying to make a centralized dataset from a federated one. Data contains path, client_id and label

So first I create a clientdata object using a function that accepts the client's id

def extract_file_paths(dataset):
  return [item["path"] for item in dataset]

@tf.function
def create_dataset(client_id):
    new_datset = tf.data.Dataset.from_tensor_slices(dict(df_aml))
    client_id = int(client_id)
    client_id = tf.cast(client_id, dtype=tf.int64)

    files = new_datset.filter(lambda x: x['client_id'] == client_id)


    list_ds = tf.data.Dataset.list_files(tf.py_function(func=extract_file_paths,inp=[files], Tout = tf.string ))
    
    
    images_ds = list_ds.map(parse_image)

        
    return images_ds

Creating clientdata:

client_ids = ['0', '1', '2']


client_data = tff.simulation.datasets.ClientData.from_clients_and_tf_fn(client_ids,
create_dataset)

centralized = client_data.create_tf_dataset_from_all_clients()

I expected a dataset with different labels and files, but this code produces a dataset with only one file. Is it because I am trying to implement the graph execution method?

I tried using a similar function for creating a clientdata object that works for federated settings and produces the expected dataset, but using the same function gives me an error when I try to produce a centralized dataset

Environment :

  • OS Platform and Distribution (MacOs BigSur 11.6.5):
  • Python version: 3.9
  • tensorflow==2.8.0
  • tensorflow-federated==0.20.0

LamaCian avatar Jul 05 '22 19:07 LamaCian

@zcharles8

LamaCian avatar Jul 07 '22 07:07 LamaCian

Hi @LamaCian. I don't believe there is enough detail here to tell what is actually happening (eg. what df_aml is). More generally, to really help here I think I would need a more minimal reproduction of the issue.

That being said, can you verify that create_dataset computes the expected dataset for each client? If not, then this is not directly a TFF issue (and is more about how to use tf.py_func, as you allude to).

zcharles8 avatar Aug 19 '22 20:08 zcharles8

@zcharles8, Yes I can verify that create_dataset computes the expected dataset for each client in the federated setting, df_aml is just a CSV file containing the image paths divided into 4 clients, Dataset can be found here

LamaCian avatar Aug 30 '22 05:08 LamaCian

Hi @LamaCian, I am closing this due to inactivity. As mentioned in my previous post, there is not enough detail to repro any of your code snippets above, so it is not possible to debug.

zcharles8 avatar Mar 16 '23 15:03 zcharles8