pywb
pywb copied to clipboard
Client-side replay: Google Drive PDFs only replay first page
Describe the bug
In client_side_replay mode, replaying harvested PDFs that were hosted on Google Drive will only display the first page. All pages will replay correctly in Browsertrix or ArchiveWeb.page.
Steps to reproduce the bug
Using Browsertrix or ArchiveWeb.page, crawl a page that contains links to PDF documents in Google Drive. Compare replay of Google Drive PDF links in Browsertrix or ArchiveWeb.page with client_side_replay mode in Pywb. Example: Page - https://www.livingwage.org.nz/reports_and_research Google Drive links - 'Annual Report 2023-2024', 'Annual Report 2022-2023'
Expected behavior
Pywb, with client_side_replay mode, should render all pages of the PDF
Screenshots
Pywb: 2.9.0 Beta 0
Replay Webpage
Environment
- OS: Ubuntu
- Browser: Firefox and Chrome
- Pywb: 2.9.0 Beta 0