crawl4ai
crawl4ai copied to clipboard
[Bug]: Target_elements doesnt work
crawl4ai version
0.7.4
Expected Behavior
Hello,
I am having an issue using target_elements to only save certain content to markdown while still allowing it to view all links a do long crawls. It wont work in python scripts and i tested even the simplest possible CLI test and it doesnt work
crwl "https://ai.google.dev/gemini-api/docs" -c "target_elements=['.devsite-banner-announcement']" -o markdown --bypass-cache
That command provides the entire page content as output instead of the expected output below:
EXPECTED BEHAVIOR OUTPUT:
Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more
Current Behaviour
crwl "https://ai.google.dev/gemini-api/docs" -c "target_elements=['.devsite-banner-announcement']" -o markdown --bypass-cache
That command provides the entire page content as output still.
It should only be providing the following content:
Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! [Learn more]
I can confirm 100% the selector is correct because when i use css selector function it works fine:
(crawl4ai) jay@MSI:~/projects/crawl4ai$ crwl "https://ai.google.dev/gemini-api/docs" -c "css_selector=.devsite-banner-announcement" -o markdown --bypass-cache Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more
Is this reproducible?
Yes
Inputs Causing the Bug
crwl "https://ai.google.dev/gemini-api/docs" -c "target_elements=['.devsite-banner-announcement']" -o markdown --bypass-cache
Steps to Reproduce
crwl "https://ai.google.dev/gemini-api/docs" -c "target_elements=['.devsite-banner-announcement']" -o markdown --bypass-cache
Code snippets
crwl "https://ai.google.dev/gemini-api/docs" -c "target_elements=['.devsite-banner-announcement']" -o markdown --bypass-cache
OS
Windows 11 WSL 2
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response