Scraping issue with normal website
When I try to scrape the https://geniusee.com/single-blog/fintech-regulation-legal-and-regulatory-aspects with browser decorator. The botosaurus just paused there. When analysed I have turned off load images and css and the target website tries to send the images anyway. This become a infinite loop. Please resolve this. Thank you.
I've tried with this code and everything worked:
from botasaurus.browser import browser, Driver
from botasaurus.lang import Lang
import time
@browser(output=None, headless=False, lang=Lang.Italian, add_arguments=['--disable-notifications'])
def scrape_heading_task(driver: Driver, data):
driver.get("https://geniusee.com/single-blog/fintech-regulation-legal-and-regulatory-aspects", timeout=10)
driver.short_random_sleep()
driver.save_screenshot("test.png")
print(driver.current_url)
print(driver.page_text)
# Initiate the web scraping task
scrape_heading_task()
Did you tried with images and CSS disabled?
Okay, I tried this code:
from botasaurus.browser import browser, Driver
from botasaurus.lang import Lang
@browser(output=None, headless=False, lang=Lang.Italian, add_arguments=['--disable-notifications'], block_images_and_css=True)
def scrape_heading_task(driver: Driver, data):
driver.get("https://geniusee.com/single-blog/fintech-regulation-legal-and-regulatory-aspects")
driver.short_random_sleep()
driver.save_screenshot("test.png")
print(driver.current_url)
print(driver.page_text)
# Initiate the web scraping task
scrape_heading_task()
The loading of the page in the browser never ends, some JavaScript code waits for the images to load.
TimeoutError: Document did not become ready within 60 seconds
Task failed for input: None
I can't help you unfortunately, I opened issues 258 for a similar case.