crawl4ai [Bug]: Unable to use Azure Open AI with Crawl4AI

crawl4ai version

0.7.4

Expected Behavior

for my task we have to use Azure Open AI for data extraction. But when I run the below code, it simply bypasses the LLM part. Having said that if at line 62 I use print(result) I will get the page content but print(result.extracted_content) returns None. I saw a similar bug raised earlier but same didn't help. @unclecode can you please help as it is a bit urgent?

I should see the extracted content.

Current Behavior

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

import asyncio
import json
from pydantic import BaseModel
from typing import Any, Dict
import os

from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LLMConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy

os.environ["AZURE_API_KEY"] = "My_KEY"
#os.environ["AZURE_API_BASE"] = "https://test.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview"
os.environ["AZURE_API_BASE"] = "https://test.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2025-01-01-preview" # This is just an example, please replace with the correct version

async def main():
    class Entity(BaseModel):
        name: str
        description: str
        
    class Relationship(BaseModel):
        entity1: Entity
        entity2: Entity
        description: str
        relation_type: str

    class KnowledgeGraph(BaseModel):
        entities: list[Entity]
        relationships: list[Relationship]


    llm_cfg = LLMConfig(
        provider="azure/gpt-4o-mini",
        api_token=os.environ["AZURE_API_KEY"],
        base_url=os.environ["AZURE_API_BASE"]
    )

    extraction_strategy = LLMExtractionStrategy(
        llm_config=llm_cfg,
        schema=KnowledgeGraph.model_json_schema(),
        extraction_type="schema",
        input_type="html",
        instruction="""Extract the date from page"""
    )
    async with AsyncWebCrawler() as crawler:
        url = "https://paulgraham.com/love.html"
        result = await crawler.arun(
            url=url,
            bypass_cache=True,
            extraction_strategy=extraction_strategy
        )

        print(result.extracted_content)

    print("Done")

if __name__ == "__main__":
    asyncio.run(main())

OS

Windows 11

Python version

3.12.2

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Aug 30 '25 07:08 qaankush-prog

LLM not used it looks:

Aug 30 '25 07:08 qaankush-prog

same cant use Azure openai models/apis or any other except their OpenAI apis. I was testing Adaptive Crawling from examples

Aug 31 '25 19:08 PrakharRanjan2909

I am also facing the same issue, it creates raw html content after scraping, but when using LLMExtractionStrategy which have llmconfig object with provider, api_token, base_url), it does not give the response of it

Sep 08 '25 12:09 Preranasingh2001

Same here. Trying to get our gpt-5-mini-model deployment on Azure OpenAI to work with crawl4ai.

I try to set it up like:

        classification_strategy = LLMExtractionStrategy(
            llm_config=LLMConfig(
                provider="azure/sp-dev-gpt5-mini", # <= our deployment name with "azure" in front as per the other bugs referencing this (also tried without "azure")
                api_token="our-api-token",
                base_url="our-api-endpoint"
            ),
            schema=PageClassification.model_json_schema(),
            extraction_type="schema",
            instruction=f"""
            #...

...but it still mistakes my Azure deployment name as an AI model name, which it should not because I'm using Azure as per the LiteLLM docs.

By enabling litellm._turn_on_debug(), I can see the error, "this model isn't mapped yet":

15:41:12 - LiteLLM:DEBUG: utils.py:4983 - Error getting model info: This model isn't mapped yet. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
15:41:12 - LiteLLM:DEBUG: main.py:892 - Error getting model info: This model isn't mapped yet. model=sp-dev-gpt5-mini, custom_llm_provider=azure. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.

Is Crawl4ai not setting the model param in the LiteLLM call to azure/our-deployment-name? Per the LiteLLM docs, we need to set:

response = completion(
    model = "azure/<your_deployment_name>", # <= maybe crawl4ai isn't setting this...?
    messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Also looked at #1248 and #174. No luck there.

@unclecode - am I missing something? Thanks :)

Oct 27 '25 19:10 DeltekDavid