warcreate icon indicating copy to clipboard operation
warcreate copied to clipboard

Embedded fonts are not included in WARCs

Open machawk1 opened this issue 5 years ago • 3 comments

On my own site (e.g., https://matkelly.com), I reference some fonts to be included and used in the CSS of the web page, e.g.,

<link rel="preload" href="/_font/IM_FELL_English_Roman.woff2" as="font" type="font/woff2" crossorigin>

The resource resolution procedure never fetches these, so the HTML representation is affected at replay. The request for the resource does appear in the WARC.

machawk1 avatar Oct 06 '20 18:10 machawk1

A generic query selector like document.querySelectorAll('link') will return all of the link tags in the document (header) but I am still searching for a less generic way to identify fonts in the same spirit of the current logic with (e.g.) document.styleSheets for CSS.

machawk1 avatar Oct 06 '20 18:10 machawk1

You may want to use "not perfect but good enough" approach of matching patters in the href, as, and/or type attribute values using the attribute selectors of CSS Selectors in your querySelectorAll call.

ibnesayeed avatar Oct 06 '20 18:10 ibnesayeed

@ibnesayeed That will be my first approach. I am still investigating if there are other resources that perhaps are missing but represented in these elements. If so, the more generic approach of querying the DOM for link elements would yield additional representations to store.

machawk1 avatar Oct 06 '20 18:10 machawk1