crawl4ai
crawl4ai copied to clipboard
Can't get screenshot working
Environment
System: Windows 11
Python version: 3.10.15
crawl4ai version: 0.3.5
Code to reproduce
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
import base64
import asyncio
crawler_strategy = AsyncPlaywrightCrawlerStrategy(
verbose=True,
headless=True,
)
async def main():
async with AsyncWebCrawler(verbose=True, crawler_strategy=crawler_strategy) as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business", bypass_cache=True, screenshot=True
)
print(result.markdown)
# Save the screenshot to a file
with open("screenshot.png", "wb") as f:
f.write(base64.b64decode(result.screenshot))
print("Screenshot saved to 'screenshot.png'!")
if __name__ == "__main__":
crawler = AsyncWebCrawler(verbose=True)
asyncio.run(main())
Expected error
Traceback (most recent call last):
File "C:\Users\User\Desktop\crawl_test\mycrawl.py", line 28, in <module>
asyncio.run(main())
File "C:\Users\User\miniconda3\envs\crawl\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\Users\User\miniconda3\envs\crawl\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "C:\Users\User\Desktop\crawl_test\mycrawl.py", line 21, in main
f.write(base64.b64decode(result.screenshot))
File "C:\Users\User\miniconda3\envs\crawl\lib\base64.py", line 80, in b64decode
s = _bytes_from_decode_data(s)
File "C:\Users\User\miniconda3\envs\crawl\lib\base64.py", line 45, in _bytes_from_decode_data
raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'NoneType'
After reviewing the source code, I noticed that the take_screenshot function is not being called when setting screenshot=True in arun.
However, it is possible to manually take a screenshot by calling screenshot = await crawler.crawler_strategy.take_screenshot(url=url)
As a feature request, could you add an option to include a wait time asyncio.sleep() after the await goto inside the take_screenshot function? Some websites have animations or other content that needs time to load, and without a delay, the screenshot may not capture the fully rendered page.
Hi @pleomax0730, absolutely you are right. Such funny things we just missed. I updated the library and soon I will release version 0.3.6, and there it will definitely be implemented. Additionally, I added your suggestion for delay and I really appreciate it. You may also check the branch "0.3.6" if you are willing to give it a try. Thank you for supporting the library.