PaddleFL
PaddleFL copied to clipboard
DPSGD question
Could you clarify implementation details for DPSGD:
- where clipping operation is implemented - client (trainer) or aggregator (server)?
- where noise addition is implemented - client (trainer) or aggregator (server)?
- what is the purpose for batch_size for the server?
Could you clarify implementation details for DPSGD:
- where clipping operation is implemented - client (trainer) or aggregator (server)?
- where noise addition is implemented - client (trainer) or aggregator (server)?
- what is the purpose for batch_size for the server?
- where clipping operation is implemented - client (trainer)
- where noise addition is implemented - client (trainer)
- what is the purpose for batch_size for the server? batch_size is only used for training on client, is not used for aggregation on server.
Thank you for the reply. Could then explain or give any clue why DPSGD training is more time consuming. Network traffic also increases significantly in comparison with FedAvg and SecAggr strategies. Clipping and adding noise does not seem too much computationally expensive operations task (if it is implemented as in arXiv:1607.00133v2), and if no extra aggregation operation is introduced traffic should not also increase a lot.
Thank you for the reply. Could then explain or give any clue why DPSGD training is more time consuming. Network traffic also increases significantly in comparison with FedAvg and SecAggr strategies. Clipping and adding noise does not seem too much computationally expensive operations task (if it is implemented as in arXiv:1607.00133v2), and if no extra aggregation operation is introduced traffic should not also increase a lot.
Can you provide more details, such as log or screenshot?