SoME_Topics icon indicating copy to clipboard operation
SoME_Topics copied to clipboard

A friendly (and maybe interactive?) introduction to differential privacy

Open TedTed opened this issue 2 years ago • 2 comments

About the author

Hi folks, I'm Damien, an expert in differential privacy — a principled way of releasing statistics about sensitive data, without leaking information about individuals. I used to work at Google and am now at a startup called Tumult Labs.

As I was learning about differential privacy (DP) during my PhD, I became frustrated about the lack of accessible material helping people learn the basics of this field. So I wrote a blog post series to fill this gap. It's still a work-in-progress — there's so much to teach! — but I've gotten feedback from a number of people thinking it's the best thing out there in terms of introductory material on DP. It's flattering but also a little sad; I'm a decent writer but not that great! DP is progressively getting more and more adoption, including in really important places for society, so we really should do a better job helping people understand how and why it works, and what it provides.

I have zero skills when it comes to video editing, or making interactive Web things. I might be too late, but if someone thinks it's an interesting topic and would like to product a video and/or an interactive explorable about it, I would be super happy to help.

Quick Summary

The most obvious thing to tackle first is to answer the question "what does DP guarantee, and why does the definition works". Formally, an algorithm A is $\varepsilon$-DP if for any two databases $D_1$ and $D_2$ having a differing record, and any output $O$, you have:

$$P[A(D_1) = O] \le e^\varepsilon \cdot P[A(D_2) = O]$$

I could imagine a lesson that would get across two core points:

  • Why does this provide a meaningful privacy notion?
  • Why does this allow us to release useful statistics?

There is a lot to say about both aspects, but we could narrow it down to some of the core intuitions.

The target audience would be people who are never heard about DP before, and don't have a technical background; alternatively, we could target people who might have heard about it, are able to understand the definition technically, but don't yet have an intuition for why it works.

Target medium

That's up to you! I would be hyped for a video, an interactive article, or even something else that I'm not imagining yet.

More details

I recommend reading at least the first few articles of my blog post series to get an idea of the audience I'm trying to target there, and to maybe learn about DP and get excited about teaching it to others =)

Contact details

Feel free to ping me on damien (at sign goes there) desfontain (dot goes here) es, or on Twitter =)

Additional context

CC-BY is fine with me.

TedTed avatar Jul 02 '22 12:07 TedTed

Minute Physics released an explanatory video on this topic a few years ago https://www.youtube.com/watch?v=pT19VwBAqKA. I'm potentially interested in making a video/article on this topic, but I wouldn't want to rehash the same explanation that was covered quite well by Minute Physics. Do you have any ideas for a more targeted aspect of DP that would make for an engaging and unique topic?

JohnEdChristensen avatar Jul 16 '22 22:07 JohnEdChristensen

Great question! I like this video a lot, and I think alternative materials would still be super valuable. In particular:

  • The video focuses a lot on explaining database reconstruction attacks, which makes a lot of sense given the Census use case, but the same idea could likely be conveyed in a shorter time, leaving more space for people to learn about the definition itself.
  • The video's goal is to explain why the Census Bureau used the notion. It doesn't help readers understand which types of use cases are a good fit for DP, and whether this is potentially something they could use themselves.
  • I'm not sure that the "slope of the plausibility curve" approach from Minute Physics is the easiest way to convey the intuition behind DP. It's not obvious how it connects to the typical formalization of DP (with two neighboring databases). This traditional formalization has a very simple intuition: if you could get the same result after removing a person, it means that you're not leaking information about that person.
  • To explain the "quantification or privacy" part of it, I think the Bayesian explanation of DP's guarantee is super helpful idea, using the "betting odds" interpretation to avoid having to say "Bayesian statistics". I'd love to explain that in a friendly video format.
  • Finally (but this could almost be a separate video), the video doesn't address "how do you achieve DP", besides the addition of noise. There are other steps that are important in practice, so there is a lot of space to give people a deeper look as to what DP will do to their data.

TedTed avatar Jul 19 '22 08:07 TedTed