pywb
pywb copied to clipboard
Post Request Body missing in index entry
Describe the bug
Youtube Videos captured with Browsertrix not playable in pywb.
Steps to reproduce the bug
- Visit: https://webarchives.rhizome.org/youtube_embeds_5_1741774579/20250312101726/https://www.youtube.com/embed/n7ky-nuw-us / Or archive a youtube page (like https://www.youtube.com/embed/n7ky-nuw-us) with browsertrix, and add it to pywb.
- Open Dev Tools, Console
- Click play
- See error message for resource "https://www.youtube.com/youtubei/v1/player?prettyPrint=false", 404 not found.
Expected behavior
The player resource should return 200, as it is in the index and the warc file, and the video should play.
Issue
The problem lies with the pywb index and its entry of the player resource. The index entry of the player resource (https://www.youtube.com/youtubei/v1/player?prettyPrint=false) is missing the post request body in the url search key. When adding the post request body to the url search key, the resource can be found and the video is playable.
- warc records (request/response)
- pywb index entry when indexing
- fixed pywb index entry (resource returns 200, video is playable) -- fixed replay
Environment
- pywb (version 2.8.0)
- Browsertrix-Crawler capture (1.5.8, with warcio.js 2.4.3)
Additional context
As descriped in the Forum post, the ArchiveWeb.Page capture of the Youtube page is working fine in pywb. The issue doesn't occure there (the pywb index is written correctly). Thats how the issue could be found: comparing the index entry of the working ArchiveWeb.page collection with the failing Browsertrix Collection.
Screenshots
Failing replay, player resource not found: