crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Unexpected error in _crawl_web

Open Martichou opened this issue 2 weeks ago • 5 comments

crawl4ai version

0.7.7

Expected Behavior

Should parse the webpage correctly.

Current Behavior

When crawling this page: https://www.toshiba-lifestyle.com/th-en/blog/how-to-choose-the-right-laundry-product-for-you

I get the following error:

[ERROR]... × https://www.toshiba-lif...laundry-product-for-you  | Error:
Unexpected error in _crawl_web at line 493 in aprocess_html
(../usr/local/lib/python3.12/site-packages/crawl4ai/async_webcrawler.py):
Error: Process HTML, Failed to extract content from the website:
https://www.toshiba-lifestyle.com/th-en/blog/how-to-choose-the-right-laundry-pro
duct-for-you, error: 1 validation error for MediaItem
width
  Input should be a valid integer, unable to parse string as an integer
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing

Code context:
 488                   )
 489
 490           except InvalidCSSSelectorError as e:
 491               raise ValueError(str(e))
 492           except Exception as e:
 493 →             raise ValueError(
 494                   f"Process HTML, Failed to extract content from the
website: {url}, error: {str(e)}"
 495               )
 496
 497           # Extract results - handle both dict and ScrapingResult
 498           if isinstance(result, dict):

Seems like something is strange in their source code, causing the issue.

Martichou avatar Nov 24 '25 09:11 Martichou