[Bug]: Unable to use Azure Open AI with Crawl4AI
crawl4ai version
0.7.4
Expected Behavior
for my task we have to use Azure Open AI for data extraction. But when I run the below code, it simply bypasses the LLM part. Having said that if at line 62 I use print(result) I will get the page content but print(result.extracted_content) returns None. I saw a similar bug raised earlier but same didn't help. @unclecode can you please help as it is a bit urgent?
I should see the extracted content.
Current Behavior
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
import asyncio
import json
from pydantic import BaseModel
from typing import Any, Dict
import os
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LLMConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy
os.environ["AZURE_API_KEY"] = "My_KEY"
#os.environ["AZURE_API_BASE"] = "https://test.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview"
os.environ["AZURE_API_BASE"] = "https://test.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2025-01-01-preview" # This is just an example, please replace with the correct version
async def main():
class Entity(BaseModel):
name: str
description: str
class Relationship(BaseModel):
entity1: Entity
entity2: Entity
description: str
relation_type: str
class KnowledgeGraph(BaseModel):
entities: list[Entity]
relationships: list[Relationship]
llm_cfg = LLMConfig(
provider="azure/gpt-4o-mini",
api_token=os.environ["AZURE_API_KEY"],
base_url=os.environ["AZURE_API_BASE"]
)
extraction_strategy = LLMExtractionStrategy(
llm_config=llm_cfg,
schema=KnowledgeGraph.model_json_schema(),
extraction_type="schema",
input_type="html",
instruction="""Extract the date from page"""
)
async with AsyncWebCrawler() as crawler:
url = "https://paulgraham.com/love.html"
result = await crawler.arun(
url=url,
bypass_cache=True,
extraction_strategy=extraction_strategy
)
print(result.extracted_content)
print("Done")
if __name__ == "__main__":
asyncio.run(main())
OS
Windows 11
Python version
3.12.2
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
LLM not used it looks:
same cant use Azure openai models/apis or any other except their OpenAI apis. I was testing Adaptive Crawling from examples
I am also facing the same issue, it creates raw html content after scraping, but when using LLMExtractionStrategy which have llmconfig object with provider, api_token, base_url), it does not give the response of it
Same here. Trying to get our gpt-5-mini-model deployment on Azure OpenAI to work with crawl4ai.
I try to set it up like:
classification_strategy = LLMExtractionStrategy(
llm_config=LLMConfig(
provider="azure/sp-dev-gpt5-mini", # <= our deployment name with "azure" in front as per the other bugs referencing this (also tried without "azure")
api_token="our-api-token",
base_url="our-api-endpoint"
),
schema=PageClassification.model_json_schema(),
extraction_type="schema",
instruction=f"""
#...
...but it still mistakes my Azure deployment name as an AI model name, which it should not because I'm using Azure as per the LiteLLM docs.
By enabling litellm._turn_on_debug(), I can see the error, "this model isn't mapped yet":
15:41:12 - LiteLLM:DEBUG: utils.py:4983 - Error getting model info: This model isn't mapped yet. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
15:41:12 - LiteLLM:DEBUG: main.py:892 - Error getting model info: This model isn't mapped yet. model=sp-dev-gpt5-mini, custom_llm_provider=azure. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.
Is Crawl4ai not setting the model param in the LiteLLM call to azure/our-deployment-name? Per the LiteLLM docs, we need to set:
response = completion(
model = "azure/<your_deployment_name>", # <= maybe crawl4ai isn't setting this...?
messages = [{ "content": "Hello, how are you?","role": "user"}]
)
Also looked at #1248 and #174. No luck there.
@unclecode - am I missing something? Thanks :)