floc icon indicating copy to clipboard operation
floc copied to clipboard

Sever-side component

Open scottlow opened this issue 5 years ago • 11 comments

One question I've had about this explainer for a while is who owns the server-side component used in aggregating the deltas from clients and training the model at scale. Is the idea to have this owned by a single trusted entity in the industry? Or perhaps a consortium of entities? Or is there a world where multiple sever-side components could securely/privately share their individual models to aid in the creation of one shared model?

If the latter is a possibility, it’d be great to understand more deeply what interfaces/protocols would need to be implemented to spin up one of these instances.

Thanks!

scottlow avatar Feb 06 '20 06:02 scottlow

FLoC (if launched) is open to the web. You can use whatever methods of creating/distributing models that you like. For instance, if you imagine that a flock is a cookie that represents many similar users instead of one, you might train your models on flocks (and the sites that those flocks visit) instead of cookies.

The interface will be an HTTPS header that comes along with the request. If you wish to receive the header, you'll need to ask for it via a client hint. The details still need to be ironed out but that's the gist of it.

jkarlin avatar Feb 10 '20 16:02 jkarlin

@jkarlin Thanks! Perhaps I misunderstood, but does federated learning not always require a server-side component to perform aggregation/model training at scale? I was envisioning that there would need to be some central service where the ML model would be continually refined/distributed from based on the results of aggregated client-side training operations.

The explainer only mentions a browser-side component, however, so I wanted to check to see if I was misunderstanding anything.

scottlow avatar Feb 10 '20 20:02 scottlow

The hope is to land FLoC with local learning and clustering only. If necessary, we'll explore adding a server-side component.

jkarlin avatar Feb 11 '20 14:02 jkarlin

To perform users clustering, there has to be some form of information sharing. A local browser isolated from the rest of the world could not discover how the user navigation can compare to others. Federated Learning alternative without server-side components do exist, but they rely on peer to peer communication. (see :https://arxiv.org/abs/1610.05202 ) Would Google consider implementing this kind of protocol to perform floc attribution? Thanks!

sokn78 avatar Feb 11 '20 17:02 sokn78

The interface will be an HTTPS header that comes along with the request. If you wish to receive the header, you'll need to ask for it via a client hint. The details still need to be ironed out but that's the gist of it.

It's not quite clear to me if this means the flock header will be visible from third-party contexts. I guess so, otherwise, the use-case is quite limited, but could you confirm?

Thanks.

sukria avatar Mar 10 '20 16:03 sukria

I agree with @sokn78, the only way to avoid a server side component would be to perform a peer to peer learning but based on the proposal, I'm not sure that this is something you were planning ?

This is something that has to be clarified if we want Flock to become a web standard and others browser to join Chrome effort. Requiring massive server side computing could limit their engagement in this project.

is there a world where multiple sever-side components could securely/privately share their individual models to aid in the creation of one shared model?

This would be a great way to avoid Google to fully handle the learning by it self.

rodolpheAV avatar Mar 10 '20 16:03 rodolpheAV

@jkarlin would you please share more thought on this topic? And the statement The hope is to land FLoC with local learning and clustering only. I think the key point here is: when assign FloC value for a user, does browser communicate with outside world (either peer to peer or to Sever-side component)? I think there are concerns for either answers.

  1. If answer is NO, it means everything is decided by ML algorithm;
  2. If answer is YES, it means user data could be collected/saved/used during the process, the job just shift from DMP to Browsers

In the reality that: Browser, ML algorithm and biggest ads exchange platform all come from the same company, how do we make sure every player in ads ecosystem get the same information.

wqi1972 avatar Jul 22 '20 03:07 wqi1972

What I meant by local learning and clustering is that the browser would in fact not communicate with the outside world when determining a user's flock. The determination would happen via open-source algorithms and pretrained models available to the public. Because the decision is local to the browser, every site has access to the same information.

Implementation and experimentation is still in its infancy, but this is the direction we'd like to go.

jkarlin avatar Jul 22 '20 18:07 jkarlin

Thanks for the clarification. open-source algorithms and pretrained models available to the public definitely is a good news to ad ecosystem. It might imply some privacy concern, such as some FloC has too many user assigned, some FloC has too less, due to the fact that no communicate with the outside world, but that is not too much a concern for ad ecosystem.

If I understand correctly, each ad platform needs to develop its own model to figure out how each FloC performs for different ads taxonomy in terms of click through rate, conversion rate, etc, so that it can make ad targeting decision based on it. This would need the open-source algorithms doesn't change too often. I know it is still infancy, but I guess we can make the assumption that it won't change too often, right @jkarlin ?

wqi1972 avatar Jul 22 '20 20:07 wqi1972

You raise a good point about privacy. We're hoping that the client side privacy protections will be enough. If we can't guarantee that they are, we may need to perform some sort of server-side verification as well. So I don't want to bind our hands here on future options. I'm just telling you what we're aspiring to.

We'll likely see rapid iteration of the algorithms in the beginning as we improve them, and we aspire to have slower updates after that. This is good for privacy as well (the more a flock changes, the more information about the user is leaked).

jkarlin avatar Jul 22 '20 20:07 jkarlin

The more frequent flock changes, the less value it has to ad platform. Ad platform needs time to learn what flock of users tend to do, then target relevant ads to them. When flock changing speed is faster than the ad platform learning speed, the FLoC will be meaningless to ad targeting. Hope the algorithms won't change often after the beginning period.

wqi1972 avatar Jul 23 '20 01:07 wqi1972