floc
floc copied to clipboard
reduce dimensionality of floc set with a bloom filter to reduce privacy budget impact
It seems a reasonably compact bloom filter could potentially express the entire set of common interests while reducing the entropy. Does this seem compatible with the floc project goals and envisioned implementation?
An enhancement to the bloom filter for this purpose could be ttl's on each element, but I am not sure if the theoretical properties of this enhanced bloom filter with positions expiring at different moments have been well demonstrated.
The advantage of the bloom filter is that it also preserves privacy; if you test the bloom filter for a segment, if the test returns false the user does not have the segment. If the test returns true, the user may have the segment. In this sense the bloom filter reduces media waste for the advertiser but prevents technology companies from being certain a user has an interest. The false positive rate is determined by the number of interests that might be expressed in the bloom filter (it seems the iab taxonomy committee has a role to play here) and the number of elements.

Here is a document on the theoretical properties of a decaying bloom filter
https://openproceedings.org/2013/conf/edbt/DautrichR13a.pdf
There are some plausible ways of building flocks that start with page taxonomies, as you suggest. It seems like it would require on-device ML to figure out the taxonomy of each page (in many languages!) — not impossible, but obviously additional work.
But I don't think the one-sided false-positive properties of a Bloom filter are particularly helpful here. For example, this would mean a person's flock would give out the definite signal "I have not visited a page about Topic T in the last D days." And if a person visits the same site regularly and that site sees their flock change, it's very likely that the change will indicate exactly what new topic of page they visited in between.
We plan to experiment with a range of approaches to clustering.