httparchive.org icon indicating copy to clipboard operation
httparchive.org copied to clipboard

Missing font_details

Open rviscomi opened this issue 4 years ago • 6 comments
trafficstars

In the March 2021 requests table, about 7% of fonts are missing the _font_details property.

SELECT
  COUNTIF(JSON_EXTRACT(payload, '$._font_details') IS NULL) AS null_font_details,
  COUNT(0) AS all_fonts,
  COUNTIF(JSON_EXTRACT(payload, '$._font_details') IS NULL) / COUNT(0) AS pct_null_font_details
FROM
  `httparchive.requests.2021_03_01_desktop`
WHERE
  JSON_EXTRACT_SCALAR(payload, '$._request_type') = 'Font'
null_font_details all_fonts pct_null_font_details
1,978,495 27,740,971 7.13%

The expected behavior is for all fonts to have a _font_details property.

cc @rsheeter

rviscomi avatar Apr 19 '21 18:04 rviscomi

For comparison, in August 2020 13% of font requests were missing _font_details, so this isn't necessarily a new issue and it may even be getting better.

rviscomi avatar Apr 19 '21 18:04 rviscomi

The case that seems particularly odd is where status is 200 but there are no font details. I tried a few urls (for Google Fonts, fonts.gstatic.com) that were status 200 with no font details and they worked and seemed to load into fonttools just fine.

select 
    status,
    countif(has_font_details) has_font,
    countif(not has_font_details) no_font,
    countif(has_font_details) / count(0) pct_has_font
from (
    select
        JSON_EXTRACT_SCALAR(payload, '$.response.status') status,
        JSON_EXTRACT(payload, '$._font_details') is not null has_font_details
    from
    `httparchive.requests.2021_03_01_desktop`
    where
    JSON_EXTRACT_SCALAR(payload, '$._request_type') = 'Font'
) t
group by status
;

claims that 93.5% of fonts with 200 responses have font details.

If I add and net.host(url) = 'fonts.gstatic.com' to try to filter to Google Fonts then I get 98.6% of responses with 200 codes have font details.

rsheeter avatar Apr 19 '21 19:04 rsheeter

cc @drott

rsheeter avatar Apr 19 '21 19:04 rsheeter

Yeah, to set expectations, it will never be 100%. The font details can only be pulled from requests that WPT managed to get the font bodies from (directly out of Chrome) and for URLs that it knows about without having to look at the netlog. For that second grouping, that would include any fonts that are pushed with HTTP/2 push but never used by the page (I assume infrequent but could be surprised).

If you have a few page URLs where it was expected to have the font details but where they weren't available it would also help.

pmeenan avatar Apr 19 '21 19:04 pmeenan

For clarification, I don't think the issue is with processing the font files themselves with fonttools but rather that we don't have the raw font file at the time of the analysis.

pmeenan avatar Apr 19 '21 19:04 pmeenan

If it's normal to occasionally successfully read metadata, observe a 200 status in the metadata, and yet not be able to get the font file that could explain what we're seeing. Out of curiosity, why do we find ourselves in that situation? Hm, also could we mark those records so we can readily filter them out?

rsheeter avatar Apr 19 '21 22:04 rsheeter