federated icon indicating copy to clipboard operation
federated copied to clipboard

How to change the total number of users in federated learning experiments? (ex) https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/simulation/datasets/emnist.py)

Open Yeojoon opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe. I am currently running some FL experiments with emnist dataset in the tensorflow-federated library (https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/simulation/datasets/emnist.py). The default total number of users for this dataset is 3400 (when only_digits=False). Is there any way to change the number of users for a particular dataset?

If not, would it be possible to add this feature? I believe this feature can be very helpful to researchers!

Thank you for your help a lot in advance!

Yeojoon avatar May 09 '23 14:05 Yeojoon

Hi @Yeojoon. One easy way to do this would be to subsample client IDs from EMNIST. This gives you a smaller total number of clients, but also reduces the total numbers of examples seen, so it's probably not right for all settings.

A more robust way to proceed would be to use tff.simulation.datasets.TransformingClientData, which allows you to take a ClientData (eg. EMNIST) and expand each client into some number of sub-clients. This would allow you to experiment with larger population sizes.

zcharles8 avatar May 09 '23 15:05 zcharles8

If neither of these solutions are exactly what you're looking for, could you add some details about what kind of feature you're looking for?

zcharles8 avatar May 09 '23 15:05 zcharles8

Thank you for your quick and kind response!

What I want to do is increasing or decreasing the total number of users without changing the total number of data examples (For the emnist case, fix the total number of train examples as 341,873). So, I agree with you that the first method may not solve this problem.

Do you think I can use your second solution to solve this problem?

Yeojoon avatar May 09 '23 15:05 Yeojoon

Could you provide a bit more detail here? How were you hoping to do this "re-partitioning", where the number of examples is fixed and the number of clients varies?

Note that [tff.simulation.datasets.TransformingClientData](https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/TransformingClientData) would allow increasing the number of clients, but would also increase the number of examples (essentially, each client would have their dataset "transformed" some number of times).

zcharles8 avatar May 09 '23 16:05 zcharles8

Do you mean the second solution increases the total number of examples?

I mean I would like to randomly choose the total number of users. Let's say # of total users = n. Then, for the emnist case, the number of train data in each user should be 341,873/n.

Yeojoon avatar May 09 '23 16:05 Yeojoon