flower Extend Strategy to provide a way to distinguish when a client is misbehaving

Describe the type of feature and its functionality.

Main functionality would provide a structured way for the flower framework to exclude specific clients when they misbehave. misbehavior is defined as a malicious act that does not conform to the convergence of the machine learning model training.

The background is that the server aggregates the results from the clients, but there are many works suggesting that clients in federated learning can sometimes misbehave, this leads to the next step which is integrating an identification mechanism into the framework.

The suggestion is to extend Strategy with an extra step as part of the aggregate_fit step split it into 2 steps:

validate_fit -> which will remove the clients that are not valid when it propagates it to aggregate_fit
aggregate_fit -> continue as usual

Describe step by step what files and adjustments are you planning to include.

The main extra files:

src/py/flwr/server/strategy/secure_strategy.py
src/py/flwr/server/strategy/secure_fedavg.py
src/py/flwr/server/validation/fldetector.py (https://dl.acm.org/doi/abs/10.1145/3534678.3539231)
src/py/flwr/server/validation/external_api.py - Delegate responsibility for external existing validation systems
examples/advanced-security - Example for how to use fldetector with fedavg

The default behaviour for secure strategy would be to continue with existing behavior(compatibility) to pass all values

Is there something else you want to add?

This is part of an ongoing Thesis project and I would be extremely appreciative of any feedback.

I already have a repository with some of the changes, but before I submit it as a PR I would like to get more feedback

Aug 25 '23 00:08 drorasaf

Hi, did you check that some work has arleady been done to integrate robust aggregation (AGR) functions into Flower? If not, look at Krum and the on-going PR about Bulyan.

In a nutshell, you don't need to create all these new classes if you objective is to implement FLDetector. Just create a class inside fldetector.py (you can mostly copy what's written in krum.py) and write the aggregation algorithm into aggregate.py. Again, look at the function aggregate_krum(.) and copy the methodology.

Separating the filtering process (i.e. discarding the bad models with k-means) and the AGR function (i.e. averaging the models) does not make sense to me. It is great if you want to change the average with something else (e.g. the mean around the median), but that's not the case. I also don't get why you would like to create the files 1, 2 and 4. Moreover the fifth file should be named examples/fldetector.py or something along this line, because naming it "advanced-security.py" let intended that it is the SOTA methodology and the only one in the literature, which is false.

Sep 28 '23 17:09 edogab33

Yes, I have seen them and the main difference and suggestion here is to separate the responsibility into 2 entities. One which is how to aggregate, the second is what to aggregate. Krum and others are performing these 2 responsibilities in one FL detector is able to detect them but is not responsible for the aggregation. hence, it is able to tell what to aggregate and similarly different algorithms. Whereas the how can be FedAvg or any other type of aggregation to perform the how on the ones that should be aggregated. What's the reason not to separate those as this provides more modularity and better support for future innovation in these two domains? There are two immediate pros for such an action:

infrastructure wise it's easier to interact and develop it as a different module, in most real world cases the responsibility of the module is not just to filter, but also to alert the relevant security team, which starts to make this repeatable module duplicative if we need to write it across algorithms.
implementing research on one of the topics and not both

Thanks for helping me figuring out where and how the fldetector example should be directory wise!

Oct 01 '23 00:10 drorasaf

@drorasaf I see and I agree with you that, in general, filtering and aggregation should be trated as two separate concepts. However, the research community treat them as the same, that's why in Flower they decided to do like that, I guess. It is understandable, since doing differently may be confusing for most of the people. On the other hand, as you said, separating the responsabilities results in better support and modularity. Just reasoning with you, I'm not a member of the team.

I thought that FLDetector was also responsible of the aggregation, that's why I wrote the previous comment. By the way, at the moment it may be faster to just implement FLDetector with FedAvg embedded

Oct 20 '23 13:10 edogab33

Hi @drorasaf I may be interested in comparing FLDetector with my on-going work. Did you release the code somewhere?

Feb 01 '24 16:02 edogab33