attribution-reporting-api icon indicating copy to clipboard operation
attribution-reporting-api copied to clipboard

Multiple attribution domains under one eTLD+1

Open shigeki opened this issue 4 years ago • 7 comments

Note that words here follow old terminology.

In Chrome 89, a conversion destination is stored as a schemeful site of eTLD+1. We have multiple service domains under one eTLD+1, and this change affects our conversion measurements.

In the following examples, example.jp has two services of shopping.example.jp and travel.example.jp, and each service has its own impression and conversion URLs. As the figure below shows, its credit of 100 is attributed to the impression of travel.example.jp regardless of conversions on only order.shopping.example.jp. issue It is better to set the FQDN of conversion origin in the conversion destination and select reporting conversions with the conversion origin to resolve this issue. But it might increase user tracking risk.

Alternatively, I think that we can introduce a new attribution of domain id to separate impressions/conversions under eTLD+1 and reduce the maximum impression data size to keep its entropy.

Or can #114 solve this issue by filtering conversion data?

shigeki avatar Mar 08 '21 13:03 shigeki

Thanks for filing! We moved from origin-based attribution scoping on the destination to eTLD+1-scoped attribution to allow for landing pages and conversion pages to be on separate origins. This helps in cases where landing pages look like shoes.com but conversions happen e.g. on purchase.shoes.com for instance.

Introducing an opt-in for tighter attribution scoping is useful. In the API currently there is one way to do this. Since attribution is scoped to a <attributeon, reportto> pair, you can shard the reporting origin per each separate destination. For example, you could have https://travel.example.jp be the configured reporting origin for travel and https://shopping.example.jp but the configured reporting origin for shopping.

I am not opposed in general to adding other mechanisms of opting in (like the conversion-filters proposal). This may also be something that is configurable in an attribution worklet (issue #114).

csharrison avatar Mar 08 '21 15:03 csharrison

In our current origin trial, we have only one domain for reporting since it is shared with our production ad services, so we will evaluate how much this issue affects our conversions through trials. The attribution worklet seems to be more flexible, and we are looking forward to having it. Thanks.

shigeki avatar Mar 08 '21 23:03 shigeki

Note the attack I described in https://github.com/privacycg/private-click-measurement/issues/57 (the same analysis goes for the destination website):

Why Not Attribution Reports To Subdomains?

Some have requested that attribution reports be sent to the full domain of the site where the click happens and similarly the full domain of the site where the conversion happens.

Neither of these meet our privacy requirements. In both cases, subdomains can be chosen to convey further information about the click or conversion.

Imagine for instance social.example where the ad click happens making sure the site is loaded from the subdomain johnwilander.social.example when I'm logged in there and from the subdomain janedoe.social.example when Jane Doe is logged in. That would take us back to cross-site tracking in the subsequent report.

The reason for restricting PCM reports to registrable domains is that the scheme+registrable domain, a.k.a. schemeful site, is the only part of a URL that is free from link decoration. All other parts can be made user specific, including subdomains.

You could of course imagine social.example setting up a registrable domain per user, such as johnwilander-social.example, and load the whole website from that domain when I'm logged in to get back to cross-site tracking of clicks. If that happens, we'd have to deal with it but at least the user has a chance to see that a personalized domain is used through the URL bar.

johnwilander avatar Mar 08 '21 23:03 johnwilander

@johnwilander Thanks for your explanations. I could not think of the privacy risk of link decoration and agree with it.

shigeki avatar Mar 08 '21 23:03 shigeki

The reason for restricting PCM reports to registrable domains is that the scheme+registrable domain, a.k.a. schemeful site, is the only part of a URL that is free from link decoration. All other parts can be made user specific, including subdomains.

Is this actually true? Couldn't a site gain e.g. 8 more bits of entropy by registering 256 public domains and funneling their conversions through those domains based on some sort of user ID? I'm not 100% certain on whether this would be a viable attack on this protocol, since I don't understand the threat model entirely, but it does seem to have similar problems at scale to the subdomain issue you talk about.

nightpool avatar Oct 31 '21 22:10 nightpool

The reason for restricting PCM reports to registrable domains is that the scheme+registrable domain, a.k.a. schemeful site, is the only part of a URL that is free from link decoration. All other parts can be made user specific, including subdomains.

Is this actually true? Couldn't a site gain e.g. 8 more bits of entropy by registering 256 public domains and funneling their conversions through those domains based on some sort of user ID? I'm not 100% certain on whether this would be a viable attack on this protocol, since I don't understand the threat model entirely, but it does seem to have similar problems at scale to the subdomain issue you talk about.

If you want to discuss PCM, you can use its repo here: https://github.com/privacycg/private-click-measurement/issues

On this specific issue, there are three important differences between different subdomains on a single eTLD+1, and different eTLD+1s:

  • Subdomains can share cookies whereas the eTLD+1s cannot.
  • If the user re-engages a couple of days later, they won't end up on the specific eTLD+1 set up for them without extra tracking powers or pure luck (they kept the tab and found it).
  • eTLD+1 typically carries branding. If you land on a randomized domain to buy something, you won't have brand recognition and you might get suspicious that this is a scam.

johnwilander avatar Nov 01 '21 05:11 johnwilander

@johnwilander

rob123ui avatar Nov 13 '21 17:11 rob123ui