crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Can't get screenshot working

Open pleomax0730 opened this issue 1 year ago • 1 comments
trafficstars

Environment

System: Windows 11
Python version: 3.10.15
crawl4ai version: 0.3.5

Code to reproduce

from crawl4ai import AsyncWebCrawler
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
import base64
import asyncio


crawler_strategy = AsyncPlaywrightCrawlerStrategy(
    verbose=True,
    headless=True,
)


async def main():
    async with AsyncWebCrawler(verbose=True, crawler_strategy=crawler_strategy) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business", bypass_cache=True, screenshot=True
        )
        print(result.markdown)
        # Save the screenshot to a file
        with open("screenshot.png", "wb") as f:
            f.write(base64.b64decode(result.screenshot))

        print("Screenshot saved to 'screenshot.png'!")


if __name__ == "__main__":
    crawler = AsyncWebCrawler(verbose=True)
    asyncio.run(main())

Expected error

Traceback (most recent call last):
  File "C:\Users\User\Desktop\crawl_test\mycrawl.py", line 28, in <module>
    asyncio.run(main())
  File "C:\Users\User\miniconda3\envs\crawl\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\User\miniconda3\envs\crawl\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "C:\Users\User\Desktop\crawl_test\mycrawl.py", line 21, in main
    f.write(base64.b64decode(result.screenshot))
  File "C:\Users\User\miniconda3\envs\crawl\lib\base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "C:\Users\User\miniconda3\envs\crawl\lib\base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'NoneType'

pleomax0730 avatar Oct 07 '24 12:10 pleomax0730

After reviewing the source code, I noticed that the take_screenshot function is not being called when setting screenshot=True in arun.

However, it is possible to manually take a screenshot by calling screenshot = await crawler.crawler_strategy.take_screenshot(url=url)

As a feature request, could you add an option to include a wait time asyncio.sleep() after the await goto inside the take_screenshot function? Some websites have animations or other content that needs time to load, and without a delay, the screenshot may not capture the fully rendered page.

pleomax0730 avatar Oct 07 '24 13:10 pleomax0730

Hi @pleomax0730, absolutely you are right. Such funny things we just missed. I updated the library and soon I will release version 0.3.6, and there it will definitely be implemented. Additionally, I added your suggestion for delay and I really appreciate it. You may also check the branch "0.3.6" if you are willing to give it a try. Thank you for supporting the library.

unclecode avatar Oct 08 '24 11:10 unclecode