almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

Third Parties 2022

Open rviscomi opened this issue 3 years ago • 9 comments

Third Parties 2022

Third Parties illustration

If you're interested in contributing to the Third Parties chapter of the 2022 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@imeugenia @imeugenia @tunetheweb @housseindjirdeh @pepelsbey @kevinfarrugia @kevinfarrugia - @siakaramalegos
Expand for more information about each role 👀
  • The content team lead is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress.
  • Authors are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report.
  • Reviewers are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases.
  • Analysts are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly.
  • Editors are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit.
  • The section coordinator is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule.

Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors.

For an overview of how the roles work together at each phase of the project, see the Chapter Lifecycle doc.

Milestone checklist

0. Form the content team

  • [x] May 1: The content team has at least one author, reviewer, and analyst

1. Plan content

  • [x] May 15 The content team has completed the chapter outline in the draft doc

2. Gather data

  • [x] June 1: Analysts have added all necessary custom metrics and drafted a PR (example) to track query progress
  • June 1 - 15: HTTP Archive runs the June crawl

3. Validate results

  • [x] August 1: Analysts have queried all metrics and saved the output to the results sheet

4. Draft content

  • [ ] September 1: The content team has written, reviewed, and edited the chapter in the doc

5. Publication

  • [ ] September 15: The completed chapter and all required metadata and figures are converted to markdown and submitted to GitHub
  • September 26: Target launch date 🚀

Chapter resources

Refer to these 2022 Third Parties resources throughout the content creation process:

📄 Google Docs for outlining and drafting content 🔍 SQL files for committing the queries used during analysis 📊 Google Sheets for saving the results of queries 📝 Markdown file for publishing content and managing public metadata 💬 #web-almanac-third-parties on Slack for team coordination

rviscomi avatar Apr 12 '22 17:04 rviscomi

Hello! I would like to be a co-author for this chapter.

imeugenia avatar Apr 16 '22 05:04 imeugenia

I can be a reviewer for this chapter 🙏

housseindjirdeh avatar Apr 18 '22 17:04 housseindjirdeh

I can be a reviewer for this chapter

pepelsbey avatar Apr 20 '22 08:04 pepelsbey

Would be happy to pitch in as analyst + reviewer if needed.

kevinfarrugia avatar Apr 21 '22 16:04 kevinfarrugia

@kevinfarrugia awesome, thanks so much! Glad to see you contribute again!

siakaramalegos avatar Apr 21 '22 16:04 siakaramalegos

@imeugenia I invited @msolercanals to the HTTP Archive Team as requested.

@msolercanals check your emails for the invite and then reach out to @kevinfarrugia to coordinate analysis. He's an old hand at this so can help you out. And make sure you both send your emails to @rviscomi to get added to the HTTP Archive BigQuery account so queries are charged there.

tunetheweb avatar May 03 '22 18:05 tunetheweb

ℹ️ @imeugenia @tunetheweb @housseindjirdeh @pepelsbey @kevinfarrugia @msolercanals reminder for anyone who hasn't yet accessed the chapter planning doc and added your ideas to the outline. I see a lot of great progress, so thank you to everyone who has contributed so far! Just checking in to make sure we're on track to complete the outline by May 15, to leave enough time in case there are any new metrics we need to add to the June crawl. Thanks!

rviscomi avatar May 06 '22 16:05 rviscomi

@imeugenia | @tunetheweb @housseindjirdeh @pepelsbey @kevinfarrugia | @kevinfarrugia @msolercanals just a reminder that the outline is due in 2 days. Make sure you open the doc, add your name and email address, and provide feedback on the ideas started there. Also, Eugenia, we'll need to understand what part is the final outline. Thanks!

siakaramalegos avatar May 13 '22 17:05 siakaramalegos

@imeugenia and team, is the outline complete? The deadline was May 15. We now have less than 2 weeks to finalize any custom metrics so getting to outline completion is critical. When it's complete, please check the milestone off. Thanks for your contributions!

siakaramalegos avatar May 18 '22 17:05 siakaramalegos

@kevinfarrugia it looks like the analysis is almost complete - can you give us an idea of % complete and timeline on the rest?

@imeugenia when do you think you can begin the draft? Just as a reminder, the due date at the end of the month is for post-review and post-edit, so you'll need to set aside at least a week for those and preferably more.

siakaramalegos avatar Aug 12 '22 17:08 siakaramalegos

@siakaramalegos I think we're done from queries. :)

kevinfarrugia avatar Aug 17 '22 13:08 kevinfarrugia

I'd be happy to be a reviewer if help is needed!

alexnj avatar Aug 17 '22 15:08 alexnj

@imeugenia when do you think you can start working on the draft? What's the plan to get back on track for completion? Thanks

siakaramalegos avatar Aug 19 '22 18:08 siakaramalegos

@kevinfarrugia / @imeugenia one thing that came to light today. The Sustainability chapter ran some Third Party SQL and got smaller results than you. Turns out they were using the Canonical Domain from the third-party dataset which groups together third-party domains into one.

For example the following are distinct domains:

  • 3461206.fls.doubleclick.net
  • 690327.fls.doubleclick.net
  • 10180635.fls.doubleclick.net

But they are grouped under the adservice.google.com canonical domain.

So depending how you count it, someone using all three is using "three third parties" (or at least "three third party domains") or "one third party". I can see the argument for both views - though the former should probably be clarified as "domains".

Checking out the queries we seem to have used them inconsistently, sometimes using one, sometimes the other. This is not @kevinfarrugia 's fault - the same mistake was made last year (where I was both analyst and author) and also the previous year, and most queries were copied year to year.

I don't think we should change this now, and would lose ability to compare with previous year. But think we should we should probably review the graphs and make clear if it's "Third Parties" or "Third Party domains".

WDYT?

FYI @fershad

tunetheweb avatar Sep 03 '22 16:09 tunetheweb

@imeugenia Not all queries are affected. I will go through the queries and see which would need to be fixed. Then we can decide the way forward.

kevinfarrugia avatar Sep 03 '22 17:09 kevinfarrugia

@imeugenia I have added a new sheet number_of_canonical_third_parties_by_rank.sql based on @tunetheweb 's feedback above. From what I can see, only one query was needed.

The difference between the new sheet and number_of_third_parties_by_rank.sql is that the former will count multiple requests to the same provider as one third party (adservice.google.com); while the latter is counting the number of distinct 3P domains (10180635.fls.doubleclick.net, 690327.fls.doubleclick.net...etc). Hope that's clear.

kevinfarrugia avatar Sep 03 '22 22:09 kevinfarrugia

Great work @kevinfarrugia !

tunetheweb avatar Sep 03 '22 22:09 tunetheweb

@imeugenia @tunetheweb @housseindjirdeh @pepelsbey @alexnj from what I can tell, we only have 1 technical review so far from Kevin Farrugia. I think we need more than 1 to be ideal. Can you review in the next day or two so that Eugenia can incorporate that feedback and we can move to the editorial review step? We're already more than a week behind so really want to wrap this up as soon as possible. Thanks!

siakaramalegos avatar Sep 09 '22 18:09 siakaramalegos

Hello all, I've taken a read through the doc and left a few small editing comments :)

shantsis avatar Sep 15 '22 01:09 shantsis