Ashwinee Panda

Results 36 comments of Ashwinee Panda

@timudk I think you can use Xuechen's modified privacy-engine for this? https://github.com/lxuechen/private-transformers/blob/main/private_transformers/privacy_engine.py I'm not sure but it looks similar to my internal implementation of the augmentation multiplicity trick. It does...

@AndreiBarsan how could it be a rounding error? The tests show '0 != 1.0' there is no rounding error that would lead to 1 sig fig of difference.

In the first test the expected value is 0.5, in the second, 1.0; the code returns 0.0 for both. Your theory is plausible but it would only explain one of...

It looks like in fetchpgd we're trying to add together a full gradient and a sketch. The sketching step should come after the pgd step.

I think fetchpgd is some unfinished code, actually. It's supposed to be evaluating the adaptive attack combination between SparseFed and my other paper Neurotoxin. We actually have results for it,...

Oh, I think that for communication efficiency you should be using the main branch and not the attacks branch. In case you mean communication efficiency with robustness: FetchSGD is in...

Sure, so SparseFed doesn't introduce top-k. Top-k is introduced by some of the papers that we cite in FetchSGD, in particular; (the particular mechanism that we use with memory) https://arxiv.org/abs/1809.07599...

This is a great writeup, thanks a lot! I'm curious whether the fusing of optimizer with backward pass runs into any issues with FSDP.

> Kwargs should indeed not be passed. I would need a reproducer but feel free to open a PR for a fix! 😉 I will open a PR after cataloguing...

Hi @loadams getting this warning w fresh install of deepspeed via pip install transformers[deepspeed], here's the output from the things you posted above ``` torch: 2.0.1 CUDA available: True ```...