crawl4ai [Bug]: cleaned_html returned without classes and ids for elements.

[Bug]: cleaned_html returned without classes and ids for elements.

Open igoralentyev opened this issue 1 month ago • 2 comments

crawl4ai version

0.7.4

Expected Behavior

Expected to see html with classes and ids, otherwise i cant use it for further analyzing

Current Behavior

No classes/ids

Is this reproducible?

Yes

Inputs Causing the Bug

crawler_config = CrawlerRunConfig(
        exclude_all_images=True,
        excluded_tags=['header', 'footer', 'meta', 'script', 'style'],
        excluded_selector=excluded_selector,  # Add excluded_selector support
        remove_overlay_elements=False,
        keep_data_attributes=True,
        wait_for="js:() => { return new Promise(resolve => setTimeout(() => resolve(true), 5000)); console.log('Waiting for 5 seconds'); }",
        # delay_before_return_html=3,
        locale="en-US",
        magic=True,
        cache_mode=CacheMode.DISABLED,
    )

Steps to Reproduce

Code snippets

OS

linus

Python version

3.12

Browser

default used, did not specified

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Nov 10 '25 13:11 igoralentyev

Actually its pretty easy to fix/patch.

You just need to change IMPORTANT_ATTRS at 50 line in config.py lib file.

Like this

IMPORTANT_ATTRS = ["src", "href", "class", "id"] # Modified: removed alt, title, width, height - added class, id

Result: this fix paired with keep_data_attributes=False returns really clean html with classes and ids

Nov 10 '25 14:11 igoralentyev

Hello @igoralentyev could you fix it and send a pull request. if that is possible

Nov 12 '25 02:11 Ahmed-Tawfik94

crawl4ai crawl4ai copied to clipboard

[Bug]: cleaned_html returned without classes and ids for elements.

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

crawl4ai
crawl4ai copied to clipboard