floc Flocks seem problematically black-box-y

As an advertiser, I may want to know: what Flocks exist? Generally, what kind of people do they represent? Who am I spending ad dollars on and why?

As a user I may want to know: what Flocks have I been put in and why? How is this changing my browsing experience – from the ads I'm seeing to the first-party content (news/social post prioritization, products, prices) I'm being served?

Many legal and public-interest entities may want to know all of the above... what flocks exist? Why are individuals being sorted the way they are? Is this categorization fair to them as individuals, and what effects might it have on the broader society?

The proposal should make clear how these sorts of questions will be answered. Machine learning and flock names like "43A7" seem to obfuscate, rather than clarify, here.

Aug 22 '19 14:08 eeeps

From an advertiser's point of view, you should think of asking "What kind of people does this flock represent?" as like asking "What kind of person does this cookie represent?" The flock or cookie doesn't come with an answer; you just need to observe behavior in order to figure it out.

Seems very likely that browsers which groups people into flocks would want to offer ways for their users to understand what has happened, as you suggest.

I agree that the techniques used for in-browser clustering should be public, so that they can be studied by legal, public-interest, academic, etc. groups. It seems hard to know the details here until we figure out what sorts of clustering make sense in the first place, of course. But we should make them transparent-box-y instead.

Aug 22 '19 14:08 michaelkleber

I guess the first way to make it more transparent will be to have an answer to: what's going to determine "what kind of clustering makes sense"? What’s the ML sorting hat going to be optimizing for?

Aug 22 '19 14:08 eeeps

Exactly! And that's why we've published this Explainer: to have a conversation about the right answer to that question. It should be a joint effort, including the federated ML research community and the ad tech companies — the producers and consumers of the signal.

Aug 22 '19 14:08 michaelkleber

From an advertiser's point of view, you should think of asking "What kind of people does this flock represent?" as like asking "What kind of person does this cookie represent?" The flock or cookie doesn't come with an answer; you just need to observe behavior in order to figure it out.

Um, i think this is not a good analogy. A cookie stores the conclusions about a behaviour as seen by a party who, except for a few exceptions, has a limited vision about the user. This party can mix his hypothesis with others (done by other, again, limited third parties) to shrink the target , and then use some "look alike" algorithm (which is usually a black box) to end up having a very limited, very noisy data based on mixing hypothesis from different sources. There are other parties who, on the other hand, can track the user in almost any situation, and can collect very precise data about the user. "Cookies or other tracking techniques may be used to collate someone’s browsing activity across many sites" is only possible if those cookies can be seen across many sites, and the data collected has lots of quality.Not many actors have the presence in many sites, or the processing power, or economic muscle, to actually making sense of all that.

But,this is very different from having a canonical source of data about an user, defining who is, what may be important, or what is sensitive, obtained from the full history of his behaviour. If . because i visited site A.com, somebody builds an wrong hypothesis of who i am, and is willing to spend dollars showing me ads, that's fine, and contributes to the noise generated by (limited) third parties.And that data can only be seen by that third party, and mix-matched with more poor data. But if it's a centralized, canonical, source of data, the one that reaches wrong conclusions, and anybody can see those conclusions, the problem is very different.

But, there's a previous step: who wants a full profile of users? An advertiser can be interested in targeting "users who visited me in the last few days, and have items in their cart".Or "users looking to travel to south-asia destinations, to offer them airplane tickets". Hardly those kind of interests can be known beforehand, be large enough to define a floc, or require to have a full user profile.

Having a full profile is interesting if you have lots of advertisers, so they can be offered almost any kind of audience.Then makes sense to collect or have all kind of data beforehand.

The following paragraph is mixing ideas: "This API democratizes access to some information about an individual’s general browsing history (and thus, general interests) to any site that opts into the header. This is in contrast to today’s world, in which cookies or other tracking techniques may be used to collate someone’s browsing activity across many sites." The first part of the paragraph, talks about the nature of the data.The second part, which is supposed to be in contrast with the first, talks about how it's being collected today. The second part of that phrase could be: "This is in contrast to today's world, where most parties have a limited, noisy data about the users, and have to put (and pay) those mechanisms theirselves".

Then: "The browser uses machine learning algorithms to develop a flock based on the sites that an individual visits. The algorithms might be based on the URLs of the visited sites, on the content of those pages, or other factors." I guess that it'd be impossible to profile anything with just the urls, so..who is providing the other data vectors ("content" of the page, or other factors)? Will the browser have some kind of content parsing about the urls? Will a publisher be able to see how this algorithm is categorizing his content, or any other content, to be sure of the fairness and accuracy of the inferred information?

How the problem of profiling is resolved by more accurate profiling? It's not only about uniquely identifiying users (that's why flocks need to be large). If an user needs to have a way to see what "flocks" they've been assigned to, isnt is much simpler to just offer them a list of advertisment categories so the user can click the ones he's interested in, without resorting to complex mechanisms?

Sep 30 '19 10:09 jmrodriguez-smartclip

floc floc copied to clipboard

Flocks seem problematically black-box-y

floc
floc copied to clipboard