crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Status code for redirect URLs is not correct

Open Dev4011 opened this issue 10 months ago • 6 comments

crawl4ai version

0.4.248

Expected Behavior

For URLs that are redirected, the status code must come in the 300 series.

Current Behavior

Hi @unclecode , Firstly, I really appreciate the amazing tool that you and the entire team have built.

While crawling, I discovered that while status code works perfectly for 200 and 404 URLs, it does not give the 300 series - redirect code. Instead, it returns 200 even for those URLs that have been redirected.

Is this reproducible?

Yes

Inputs Causing the Bug

URL: http://testfire.net/doLogin

Steps to Reproduce

1. Run the below code
2. Find the status_code and redirected url printed

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
import nest_asyncio
nest_asyncio.apply()

async def main():
    async with AsyncWebCrawler(
        headless=True,
        verbose=True,
    ) as crawler:
        url="http://testfire.net/doLogin"
        result = await crawler.arun(url, cache_mode=CacheMode.BYPASS)

    print(f"Original URL: {url}")
    print(f"Status code: {result.status_code}")
    print(f"Redirected URL: {result.redirected_url}")

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

OS

Google Colab

Python version

3.11.11

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

The browser network panel showing that the link has been redirected: Image

The code block showing incorrect status_code: Image

Dev4011 avatar Feb 12 '25 12:02 Dev4011

@aravindkarnam @unclecode can you kindly help me out with this?

Dev4011 avatar Feb 18 '25 08:02 Dev4011

@Dev4011 tux for kind words, you right, not just return the redirected url but also the status code should be updated. @aravindkarnam plz add these to list

unclecode avatar Feb 18 '25 09:02 unclecode

Thank you so much @unclecode 👍

Dev4011 avatar Feb 18 '25 09:02 Dev4011

@unclecode @aravindkarnam I have a similar issue, I am trying a url but it's not getting redirected.

Image

Image

manikaran1993 avatar Apr 26 '25 22:04 manikaran1993

@unclecode @aravindkarnam I have a similar issue, I am trying a url but it's not getting redirected.

Image

Image

Same here. While trying to crawl an URL which redirect to other URL, like the case from manikaran1993, google news, crawler only crawl the original page, rather than the redirected one

aceonaceon avatar Apr 27 '25 00:04 aceonaceon

Thank you for reporting this issue.
We have just merged a fix that now displays the correct 30× status code.
The patch will be included in the next releases.

ntohidi avatar Apr 30 '25 10:04 ntohidi