shot-scraper
shot-scraper copied to clipboard
--init-script support
Init scripts are special JavaScript that gets run to prime the page before the URL is loaded:
https://playwright.dev/python/docs/api/class-page#page-add-init-script
Adds a script which would be evaluated in one of the following scenarios:
- Whenever the page is navigated.
- Whenever the child frame is attached or navigated. In this case, the script is evaluated in the context of the newly attached frame.
The script is evaluated after the document was created but before any of its scripts were run. This is useful to amend the JavaScript environment, e.g. to seed
Math.random
.
This should be an option for shot
and javascript
and more.
One thing this can be useful for is taking screenshots of pages that detect and block headless Chrome. They seem to often do that by looking for navigator.webdriver
.
https://www.news.com.au/ is an example:
shot-scraper https://www.news.com.au/ -h 600
But using the prototype from https://github.com/simonw/shot-scraper/commit/fae9babee52fc109c643501dd74cb9f75d18d19b and a tip from https://stackoverflow.com/a/75771301/6083
shot-scraper https://www.news.com.au/ -h 600 \
--init-script 'delete Object.getPrototypeOf(navigator).webdriver' \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0'
Asked ChatGPT for more ideas of things to do with init scripts: https://chat.openai.com/share/71c5302f-bb92-4bd8-8eb3-311d855311b0
A few that I really liked
browser_context.add_init_script("""
Date.now = function() { return new Date('2024-01-01T00:00:00Z').getTime(); };
""")
browser_context.add_init_script("""
const originalFetch = window.fetch;
window.fetch = async function(...args) {
if (args[0].includes('api.example.com')) {
return new Response(JSON.stringify({ mocked: true }), { status: 200 });
}
return originalFetch(...args);
};
""")
browser_context.add_init_script("""
localStorage.setItem('key', 'value');
document.cookie = 'name=value; path=/';
""")
Claude 3 Opus suggested "Simulate a specific device":
page.add_init_script("""
Object.defineProperty(window, 'innerWidth', {
writable: true,
configurable: true,
value: 375,
});
Object.defineProperty(window, 'innerHeight', {
writable: true,
configurable: true,
value: 812,
});
""")