[Bug]: Status code for redirect URLs is not correct
crawl4ai version
0.4.248
Expected Behavior
For URLs that are redirected, the status code must come in the 300 series.
Current Behavior
Hi @unclecode , Firstly, I really appreciate the amazing tool that you and the entire team have built.
While crawling, I discovered that while status code works perfectly for 200 and 404 URLs, it does not give the 300 series - redirect code. Instead, it returns 200 even for those URLs that have been redirected.
Is this reproducible?
Yes
Inputs Causing the Bug
URL: http://testfire.net/doLogin
Steps to Reproduce
1. Run the below code
2. Find the status_code and redirected url printed
Code snippets
import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
import nest_asyncio
nest_asyncio.apply()
async def main():
async with AsyncWebCrawler(
headless=True,
verbose=True,
) as crawler:
url="http://testfire.net/doLogin"
result = await crawler.arun(url, cache_mode=CacheMode.BYPASS)
print(f"Original URL: {url}")
print(f"Status code: {result.status_code}")
print(f"Redirected URL: {result.redirected_url}")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
OS
Google Colab
Python version
3.11.11
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
The browser network panel showing that the link has been redirected:
The code block showing incorrect status_code:
@aravindkarnam @unclecode can you kindly help me out with this?
@Dev4011 tux for kind words, you right, not just return the redirected url but also the status code should be updated. @aravindkarnam plz add these to list
Thank you so much @unclecode 👍
@unclecode @aravindkarnam I have a similar issue, I am trying a url but it's not getting redirected.
@unclecode @aravindkarnam I have a similar issue, I am trying a url but it's not getting redirected.
Same here. While trying to crawl an URL which redirect to other URL, like the case from manikaran1993, google news, crawler only crawl the original page, rather than the redirected one
Thank you for reporting this issue.
We have just merged a fix that now displays the correct 30× status code.
The patch will be included in the next releases.