botasaurus
botasaurus copied to clipboard
The All in One Framework to build Awesome Scrapers.
Starting with Chrome version 137, Google has deprecated the --load-extension command-line flag in branded Chrome builds, citing security concerns. This change is impacting Botasaurus. Reference: Google's official documentation on Chrome...
Avoid npm errors during install due to some deprecated things
## Issue Description When using Botasaurus in `parallel=True` mode with a Chrome profile, the driver systematically fails to connect to Chrome's debugging port, resulting in multiple empty browser windows being...
I am running botasaurus with chromium with the following settings ``` @browser( output=None, create_error_logs=False, headless=True, close_on_crash=True, block_images=True, add_arguments=[ "--no-sandbox", "--allow-insecure-localhost", "--disable-web-security", "--disable-features=IsolateOrigins,site-per-process", "--disable-blink-features=AutomationControlled", "--disable-crash-reporter", "--disable-breakpad", ], raise_exception=True, user_agent=dynamic_user_agent, window_size=WindowSize.RANDOM, wait_for_complete_page_load=False,...
from botasaurus.browser import browser, Driver from botasaurus.user_agent import UserAgent from botasaurus.window_size import WindowSize from botasaurus.request import request, Request @request( user_agent=UserAgent.google_bot, ) def visit_whatsmyua(driver: Request, data): response=driver.get("https://www.Loopnet.com/") print(response.status_code) visit_whatsmyua() eaders, response_object['response'],...
from botasaurus.request import Request -> class Request(Session): def __init__(self, proxy=None, user_agent=None): self._proxy = proxy self._user_agent = user_agent FIX: class Request(Session): def __init__(self, proxy=None, user_agent=None): super().__init__() #HERE self._proxy = proxy self._user_agent...
raise ClientException(response_object['body']) botasaurus_requests.exceptions.ClientException: failed to do request: Get "https://Loopnet.com/robots.txt": stream error: stream ID 1; INTERNAL_ERROR Task failed for input: {'url': 'https://Loopnet.com'} why getting this error? just simply requesting the https://Loopnet.com
This code worked on earlier versions of chrome. Specifically I just tested on chrome 132 and it doesn't produce error: ``` from botasaurus.browser import browser, Driver @browser(output=None) def run_first(driver: Driver,...
this is a function for future and any one who straggled with the tab switching ``` # tab_switcher.py (extract) from typing import Any, Optional, Sequence import time import logging import...
Trying this for monitoring requests ``` @browser() def scrape_responses_task(driver: Driver, data): # Define a handler function that will be called after a response is received def after_response_handler( request_id: str, response:...