floc icon indicating copy to clipboard operation
floc copied to clipboard

(Sec-CH-Flock + Client's IP address) == Unique Identifier?

Open ehsan opened this issue 5 years ago • 8 comments

It is quite likely that the combination of (Sec-CH-Flock + Client's IP address) creates a unique identifier for most interesting devices on the planet, assuming by an interesting device we're talking about a device that has been used by its human user for some web browsing activity.

The proposal indicates that this combination might uniquely identify the user, which sounds like a far fetched assumption to be honest (but this is something that we'd need some data about).

What is the plan to ensure that just by adding this one header we do not create a unique identifier?

ehsan avatar Aug 30 '19 18:08 ehsan

To clarify, this is the beginning of an open exploration, so there isn't a plan yet.

That said, the IP address + nearly anything is already a great tracking identifier. So there is a larger problem that needs to be solved here. I'm not aware of a way to deal with IP address tracking short of regularly rotating client IP addresses through a large prefix or proxying traffic.

I'd also point to the privacy budget as an overall mechanism for controlling joint entropy which also discusses IP address. The flock would need to fall within the privacy budget.

jkarlin avatar Aug 30 '19 18:08 jkarlin

I'd also point to the privacy budget as an overall mechanism for controlling joint entropy which also discusses IP address. The flock would need to fall within the privacy budget.

This part wasn't obvious from the text!

ehsan avatar Aug 30 '19 19:08 ehsan

FYI, also relevant: we've updated the "What is Chrome's threat model for fingerprinting?" section of our security FAQ to call them "privacy bugs that we will try to resolve."

michaelkleber avatar Aug 30 '19 20:08 michaelkleber

Sorry I didn't have time for a full response before.

That said, the IP address + nearly anything is already a great tracking identifier. So there is a larger problem that needs to be solved here. I'm not aware of a way to deal with IP address tracking short of regularly rotating client IP addresses through a large prefix or proxying traffic.

The specific problem here is that this is making the issue worse in concrete ways.

For example, based on Firefox telemetry data the IP addresses of a percentage of our users changes regularly over time. With FLoC those requests that would be presented to the network with different IP addresses now would have another identifier (the FLoC key) which is either stable or if it changes it will probably change at different times than the IP address would change, which would make linking the traffic across IP addresses easier.

I'd also point to the privacy budget as an overall mechanism for controlling joint entropy which also discusses IP address. The flock would need to fall within the privacy budget.

Hmm it is unclear to me how privacy budget is a useful solution to this. As far as I understand privacy budget relies on measuring the entropy of existing features overall. FLoC is not an existing feature. Are there mechanisms to measure the additional entropy introduced by FLoC that are not mentioned in the explainer?

Without those, I take your suggestion as let's add FLoC to the Web Platform, then measure the resulting additional entropy that our user bases would receive as a result, and then work to limit it when it exceeds the budget. But we don't know how many bits of entropy we're talking about before we add FLoC to the Web Platform, so we can't really reason about its additional risk in terms of exposing users to more detailed fingerprinting until then. (And there's nothing in privacy budget AFAIK to suggest that the number of exposed bits at the budget level wouldn't be sufficient to e.g. uniquely identify an individual on the planet.)

This seems like a circular logic, and I don't really follow it. What am I missing?

ehsan avatar Sep 03 '19 21:09 ehsan

And speaking of the relationship of privacy budget and FLoC, wouldn't FLoC fall under "passive fingerprinting" which privacy budget isn't trying to address?

ehsan avatar Sep 03 '19 21:09 ehsan

If FLoC is communicated via client hint, and access is delegated to third parties by the first party via Feature Policy, then it's more-or-less equivalent to active fingerprinting, risk-wise:

https://github.com/yoavweiss/client-hints-infrastructure

My understanding of the reasoning here is that, by default, FLoC (like all client hints) won't be delivered to third parties. The first party has to explicitly grant third parties access to it – just as they might explicitly choose to include "active" third party resources, which could be used for active fingerprinting. And that granting of access is visible to, and ultimately controllable by, the end user-agent.

eeeps avatar Sep 03 '19 22:09 eeeps

My understanding of the reasoning here is that, by default, FLoC (like all client hints) won't be delivered to third parties. The first party has to explicitly grant third parties access to it – just as they might explicitly choose to include "active" third party resources, which could be used for active fingerprinting. And that granting of access is visible to, and ultimately controllable by, the end user-agent.

By reading the FLoC document, I understood the opposite: that the Sec-CH-Flock header would be sent over by Chrome with all HTTP requests (even in third-party context).

What's the point of view of the Chrome team on this question?

Thanks.

sukria avatar Mar 10 '20 15:03 sukria

@ehsan I think we're in agreement here. In the long run we don't want sites to be able to identify a user via IP address and FLoC combined. So either there isn't enough join information to identify a user between them, or they need to be separated (e.g., see willful IP blindness).

@sukria We really haven't figured out the details on the behavior of the header yet (e.g., is CH going to be partitioned per site?) but the idea is that third-parties on a page could have access to the header. Whether that is delegated by the 1p or not is unclear. It's often the case that the 3p has script access in the 1p context so I'm not sure it fundamentally makes much difference.

jkarlin avatar Mar 10 '20 15:03 jkarlin