bigquery icon indicating copy to clipboard operation
bigquery copied to clipboard

blink_features.usage has null rank column

Open tunetheweb opened this issue 4 years ago • 3 comments

Since we have this column can we populate it with the new CrUX ranking? It's confusing not to have it in here, makes joins more difficult, and means you need an extra join to summary_pages table to get ranking.

@rviscomi / @pmeenan not sure what populates this table and so where this change would need to be made?

tunetheweb avatar Jul 19 '21 11:07 tunetheweb

There's a pair of "materialize blink features" scheduled queries at the project level in BigQuery that generate the blink_features tables on the 1st of the month. We'd need to edit these queries to output the ranking info.

rviscomi avatar Jul 26 '21 20:07 rviscomi

Can't find where these are. Can you give me a pointer?

Also why the first of the month as thought these were based off the crawl? So why not generated with the rest of them?

tunetheweb avatar Jul 29 '21 08:07 tunetheweb

Can't find where these are. Can you give me a pointer?

From BigQuery, on the left panel it says "Scheduled queries". https://console.cloud.google.com/bigquery/scheduled-queries?project=httparchive

Also why the first of the month as thought these were based off the crawl? So why not generated with the rest of them?

We could look into doing that, similar to how we generate the technologies table. But we would need to be more careful since this table is appended to each month and repeated Dataflow jobs might result in duplicate data.

I've updated the features table to include rank info starting in the July dataset being processed in a couple of days. We would still need to aggregate pages by rank for the usage table and update consumers of the table accordingly. The most notable consumer is chromestatus.com but there may be other queries floating around.

rviscomi avatar Jul 30 '21 21:07 rviscomi