federated
federated copied to clipboard
create_tf_dataset_from_all_clients() function produces one-labeled dataset with only one file
I am trying to make a centralized dataset from a federated one. Data contains path, client_id and label
So first I create a clientdata object using a function that accepts the client's id
def extract_file_paths(dataset):
return [item["path"] for item in dataset]
@tf.function
def create_dataset(client_id):
new_datset = tf.data.Dataset.from_tensor_slices(dict(df_aml))
client_id = int(client_id)
client_id = tf.cast(client_id, dtype=tf.int64)
files = new_datset.filter(lambda x: x['client_id'] == client_id)
list_ds = tf.data.Dataset.list_files(tf.py_function(func=extract_file_paths,inp=[files], Tout = tf.string ))
images_ds = list_ds.map(parse_image)
return images_ds
Creating clientdata:
client_ids = ['0', '1', '2']
client_data = tff.simulation.datasets.ClientData.from_clients_and_tf_fn(client_ids,
create_dataset)
centralized = client_data.create_tf_dataset_from_all_clients()
I expected a dataset with different labels and files, but this code produces a dataset with only one file. Is it because I am trying to implement the graph execution method?
I tried using a similar function for creating a clientdata object that works for federated settings and produces the expected dataset, but using the same function gives me an error when I try to produce a centralized dataset
Environment :
- OS Platform and Distribution (MacOs BigSur 11.6.5):
- Python version: 3.9
- tensorflow==2.8.0
- tensorflow-federated==0.20.0
@zcharles8
Hi @LamaCian. I don't believe there is enough detail here to tell what is actually happening (eg. what df_aml
is). More generally, to really help here I think I would need a more minimal reproduction of the issue.
That being said, can you verify that create_dataset
computes the expected dataset for each client? If not, then this is not directly a TFF issue (and is more about how to use tf.py_func
, as you allude to).
@zcharles8, Yes I can verify that create_dataset computes the expected dataset for each client in the federated setting, df_aml is just a CSV file containing the image paths divided into 4 clients, Dataset can be found here
Hi @LamaCian, I am closing this due to inactivity. As mentioned in my previous post, there is not enough detail to repro any of your code snippets above, so it is not possible to debug.