cannot access local variable 'filtered_html"
Hi,
New here. Can't run the sample code with the error:
code: import asyncio from crawl4ai import AsyncWebCrawler
async def main(): # Create an instance of AsyncWebCrawler async with AsyncWebCrawler(verbose=True) as crawler: # Run the crawler on a URL result = await crawler.arun(url="https://www.nbcnews.com/business")
# Print the extracted content
print(result.markdown)
Run the async main function
asyncio.run(main())
Got the following error:
[INIT].... → Crawl4AI 0.3.741 [FETCH]... ↓ https://www.nbcnews.com/business... | Status: True | Time: 0.02s [SCRAPE].. ◆ Processed https://www.nbcnews.com/business... | Time: 39ms [COMPLETE] ● https://www.nbcnews.com/business... | Status: True | Total: 0.07s Error using new markdown generation strategy: cannot access local variable 'filtered_html' where it is not associated with a value
any idea? Thanks.
Plus one here
I am also facing the same issue.
+1
I have a PR to fix this but not merged so far. Workaround could be one of this:
Use an older version
pip install --force-reinstall -v "crawl4ai==0.3.731"
Fix locally
- git clone and fix locally as I did
- Do
pip install -e .(update package with local change)
+1
+1
Similarly having this issue
+1
Hey everyone, sorry for the inconvenience, already merged the PR, thx @leonson. Btw, the version 0.3.743 will have the patch; I'll release it tonight. For detailed explanation please check this issue, I have explained in details https://github.com/unclecode/crawl4ai/issues/287#issuecomment-2503669235
However, make sure to update ot the latest version by tomorrow and try this:
import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
from crawl4ai.content_filter_strategy import BM25ContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
async def main():
async with AsyncWebCrawler(
headless=True,
verbose=True,
) as crawler:
result = await crawler.arun(
url="URL",
cache_mode=CacheMode.BYPASS,
)
print(len(result.markdown_v2.raw_markdown))
# For compatibility with previous versions, still you can have it like below:
# print(len(result.markdown))
if __name__ == "__main__":
asyncio.run(main())
@valtahomes @didntpay @Krish-Goyani @OctAg0nO @vetharupini @marioguima @Ches-ctrl @lexang
Closing this issue, since the patch is already released!