[Bug]: Crawl returning 'str' object has no attribute 'choices
crawl4ai version
0.5.0
Expected Behavior
Return a normal crawl matching my schema.
Current Behavior
I am crawling a set of about 600 links. For some links, the crawl works perfectly, but for others, the crawler returns: [ { "index": 0, "error": true, "tags": [ "error" ], "content": "'str' object has no attribute 'choices'" } ]
there is no pattern for which links the crawler returns this for which makes me wonder whether this is an API issue. Has anyone seen or encountered this bug?
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
mac)S
Python version
3.13
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Same here. Using deepseek chat API
Same here. Using deepseek chat API
Hey! Thanks for replying. Are you using the deepseek API inside crawl4ai? or are you using it standalone?
- Thanks
Same here. Using deepseek chat API
Hey! Thanks for replying. Are you using the deepseek API inside crawl4ai? or are you using it standalone?
- Thanks
Inside LLMExtractionStrategy as:
llm_strategy = LLMExtractionStrategy(
llm_config=LLMConfig(provider="deepseek/deepseek-chat", api_token=os.getenv("DEEPSEEK_API")),
schema=LeaderboardEntry.model_json_schema(),
extraction_type="schema",
instruction=INSTRUCTION_TO_LLM,
chunk_token_threshold=1000,
overlap_rate=0.0,
apply_chunking=True,
input_format="markdown",
extra_args={"temperature": 0.0, "max_tokens": 2048},
)
I'm running my API's through openrouter and getting this error.
` llm_strategy = LLMExtractionStrategy( llm_config=LLMConfig( provider="deepseek/deepseek-chat", api_token="os.getenv("openrouter")), base_url="https://openrouter.ai/api/v1",
)`
Output:
{'index': 0, 'error': True, 'tags': ['error'], 'content': 'litellm.BadRequestError: DeepseekException - {"error":{"message":"deepseek-chat is not a valid model ID","code":400},"user_id":"user_2th1C5iID3WInICREZPY1NCmXhb"}'}
Have you ran into this at all?
Yes, I'm facing the same issue with DeepSeek model through groq Api
Same issue on version 0.5.0.post8:
llm_strategy = LLMExtractionStrategy(
llm_config=LLMConfig(
provider="gemini/gemini-2.0-flash",
api_token=API_KEY,
),
schema=SportOffersList.model_json_schema(),
extraction_type="schema",
instruction=SPORT_OFFER_DATA_PROMPT,
# chunk_token_threshold=1000,
# overlap_rate=0.0,
# apply_chunking=False,
input_format="markdown",
extra_args={"temperature": 0},
verbose=True,
)
# 2. Build the crawler config
crawl_config = CrawlerRunConfig(
extraction_strategy=llm_strategy,
cache_mode=CacheMode.DISABLED,
# exclude_external_links=True,
# remove_overlay_elements=True,
)
It worked with 0.5.0.post4
Having the same issue, I switched to 0.5.0.post4 as mentioned by @Blackvz and it worked
Im on 0.6.0 (docker /crawl) and keep getting:
"extracted_content":
"[\n {\n "index": 0,\n "error": true,\n "tags": [\n "error"\n ],\n "content": "'str' object has no attribute 'choices'"\n }\n]",
Have seen this with openrouter models and now tried with gemini (ai studio) and getting the same.
This is my request body:
{
"urls": ["__url__"],
"browser_config": {
"type": "BrowserConfig",
"params": {
"headless": true,
"viewport": {
"type": "dict",
"value": {
"width": 1200,
"height": 800
}
}
}
},
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {
"css_selector": "main",
"extraction_strategy": {
"type": "LLMExtractionStrategy",
"params": {
"input_format": "markdown",
"llm_config": {
"type": "LLMConfig",
"params": {
"provider": "gemini/gemini-2.5-flash-preview-04-17",
"api_token": "__key__"
}
},
"schema": {
"title": "IndexPageLinks",
"type": "object",
"properties": {
"links": {
"title": "Links",
"type": "array",
"description": "List of links found on the page.",
"items": {
"type": "string"
}
}
},
"required": [
"links"
]
},
"extraction_type": "schema",
"instruction": "Scan the content of the page for interesting links. Extract the top 5 most important links found on the page and return them"
}
}
}
}
}
same for me
Same here. Using
ollama run deepseek-r1
the same issue,but the reason is diffrent, when I use the api.deepseek.com(the official net of deepseek), the program return the right result,but the deepseek-r1:32 and qwq32b would return this error.my program is crawling data from the pdf url.so , the problem is prompt,or llm?
The error occurs because some models return reasoning_content instead of content in crawl4ai/extraction_strategy.py (response = response.choices[0].message.content). Try using a different model to resolve this
Looks like the the response variable get's redefined and that causes an issue with this try-except block
I was able to resolve this issue for myself by changing the try-except block in .venv\Lib\site-packages\crawl4ai\extraction_strategy.py on line 657
Just commented the re-definition and refer to response.choices[0].message.content where the redefined response variable was referenced
try:
# response = response.choices[0].message.content
blocks = None
if self.force_json_response:
blocks = json.loads(response.choices[0].message.content)
if isinstance(blocks, dict):
# If it has only one key which calue is list then assign that to blocks, exampled: {"news": [..]}
if len(blocks) == 1 and isinstance(list(blocks.values())[0], list):
blocks = list(blocks.values())[0]
else:
# If it has only one key which value is not list then assign that to blocks, exampled: { "article_id": "1234", ... }
blocks = [blocks]
elif isinstance(blocks, list):
# If it is a list then assign that to blocks
blocks = blocks
else:
blocks = extract_xml_data(["blocks"], response.choices[0].message.content)["blocks"]
# blocks = extract_xml_data(["blocks"], response)["blocks"]
blocks = json.loads(blocks)
for block in blocks:
block["error"] = False
except Exception:
parsed, unparsed = split_and_parse_json_objects(
response.choices[0].message.content
)
blocks = parsed
if unparsed:
blocks.append(
{"index": 0, "error": True, "tags": ["error"], "content": unparsed}
)
thanks @quangvinh2080 for the lead
Thanks @JWBWork that was indeed the root cause for my errors too.
I think it should be fine to leave the redefinition of response as long as the exception block simply uses response, as an argument of split_and_parse_json_objects:
try:
response = response.choices[0].message.content
blocks = None
...
except Exception:
parsed, unparsed = split_and_parse_json_objects(response)
...
This scenario assumes that the exception is not raised as part of accessing response.choices[0].message.content. I'm not super familiar with LiteLLM, so feel free to correct me if I'm wrong here.
@JWBWork's changes fixed the initial problems for me. Yes, if exception also used response instead of assuming it's still a ModelResponse that would fix causing a second exception when handling the first.
Debugging my problems I have figured out that my gemma3 model locally is returning XML instead of JSON in the <blocks>...</blocks> (despite the prompt saying not to do that), which the blocks = json.loads(blocks) line can't handle.
The way this is handling the errors makes it very difficult to realize it's actually the LLM just returning bad information.
Debugging my problems I have figured out that my gemma3 model locally is returning XML instead of JSON in the
... (despite the prompt saying not to do that), which the blocks = json.loads(blocks) line can't handle.
The way this is handling the errors makes it very difficult to realize it's actually the LLM just returning bad information.
+1 on that. I had exact same issue with the model's "thinking" response including the <blocks> tag and only after debugging for a while I found this root cause.
@JWBWork โ thanks a ton for digging into this! ๐
I can confirm that the bug reproduces 100 % with the docs-example script
below (only my OPENAI_API_KEY and the test URL were changed).
# reproduce.py
import os, asyncio
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from crawl4ai import (
AsyncWebCrawler, CrawlerRunConfig, LLMConfig,
CacheMode, BrowserConfig
)
from crawl4ai.extraction_strategy import LLMExtractionStrategy
load_dotenv(".env.txt") # just holds OPENAI_API_KEY
class EmployeeNamesSchema(BaseModel):
employee_names: list = Field(..., description="List of employee or owner names")
async def test():
crawler_cfg = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
word_count_threshold=1,
page_timeout=80_000,
extraction_strategy=LLMExtractionStrategy(
llm_config=LLMConfig(
provider="openai/gpt-4o-mini",
api_token=os.getenv("OPENAI_API_KEY"),
),
schema=EmployeeNamesSchema.model_json_schema(),
extraction_type="schema",
instruction='Extract all employee names from the page and return {"employee_names":[...]}.',
extra_args={"temperature": 0, "max_tokens": 512},
),
)
async with AsyncWebCrawler(config=BrowserConfig(headless=True)) as crawler:
res = await crawler.arun(
"https://2jtandartspraktijk.tandartsennet.nl/team/",
config=crawler_cfg,
)
print(res.extracted_content)
if __name__ == "__main__":
asyncio.run(test())
Result on main:
[
{
"index": 0,
"error": true,
"tags": ["error"],
"content": "'str' object has no attribute 'choices'"
}
]
Result after applying your one-liner from PR #980
{"employee_names": ["Dr. El Zowini", "T. de Groot", "T. Vorstenbosch", "I. de Jong"]}
So the issue isnโt in user codeโitโs in the current repo head. Could we merge the patch (and maybe cut a quick point-release) so new users donโt hit the same wall?
/cc @unclecode for visibility ๐
Thank you all for your assistance in identifying the root cause of this issue. I will work on a fix, which will be included in our next release. I will most likely add the fix to the May bug-fix branch. Iโll keep everyone updated on the progress here.
cc @aravindkarnam
Chinese discussion forum implementing fix for same problem: https://linux.do/t/topic/561221
Iโve resolved the issue. The fix is now available in the 2025-MAY-2 branch.
@ntohidi, thanks for fixing! Is there a pre-release version I can install with the 2025-MAY-2 code? Struggling to get a git branch-based pip installation to work.
@richardgirges This is now merged to main branch in v0.7!