crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Unable to use Azure Open AI with Crawl4AI

Open qaankush-prog opened this issue 4 months ago • 4 comments

crawl4ai version

0.7.4

Expected Behavior

for my task we have to use Azure Open AI for data extraction. But when I run the below code, it simply bypasses the LLM part. Having said that if at line 62 I use print(result) I will get the page content but print(result.extracted_content) returns None. I saw a similar bug raised earlier but same didn't help. @unclecode can you please help as it is a bit urgent?

I should see the extracted content.

Current Behavior

Image

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets

import asyncio
import json
from pydantic import BaseModel
from typing import Any, Dict
import os

from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LLMConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy

os.environ["AZURE_API_KEY"] = "My_KEY"
#os.environ["AZURE_API_BASE"] = "https://test.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview"
os.environ["AZURE_API_BASE"] = "https://test.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2025-01-01-preview" # This is just an example, please replace with the correct version

async def main():
    class Entity(BaseModel):
        name: str
        description: str
        
    class Relationship(BaseModel):
        entity1: Entity
        entity2: Entity
        description: str
        relation_type: str

    class KnowledgeGraph(BaseModel):
        entities: list[Entity]
        relationships: list[Relationship]


    llm_cfg = LLMConfig(
        provider="azure/gpt-4o-mini",
        api_token=os.environ["AZURE_API_KEY"],
        base_url=os.environ["AZURE_API_BASE"]
    )

    extraction_strategy = LLMExtractionStrategy(
        llm_config=llm_cfg,
        schema=KnowledgeGraph.model_json_schema(),
        extraction_type="schema",
        input_type="html",
        instruction="""Extract the date from page"""
    )
    async with AsyncWebCrawler() as crawler:
        url = "https://paulgraham.com/love.html"
        result = await crawler.arun(
            url=url,
            bypass_cache=True,
            extraction_strategy=extraction_strategy
        )

        print(result.extracted_content)

    print("Done")

if __name__ == "__main__":
    asyncio.run(main())

OS

Windows 11

Python version

3.12.2

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

qaankush-prog avatar Aug 30 '25 07:08 qaankush-prog

LLM not used it looks:

Image

qaankush-prog avatar Aug 30 '25 07:08 qaankush-prog

same cant use Azure openai models/apis or any other except their OpenAI apis. I was testing Adaptive Crawling from examples

PrakharRanjan2909 avatar Aug 31 '25 19:08 PrakharRanjan2909

I am also facing the same issue, it creates raw html content after scraping, but when using LLMExtractionStrategy which have llmconfig object with provider, api_token, base_url), it does not give the response of it

Preranasingh2001 avatar Sep 08 '25 12:09 Preranasingh2001

Same here. Trying to get our gpt-5-mini-model deployment on Azure OpenAI to work with crawl4ai.

I try to set it up like:

        classification_strategy = LLMExtractionStrategy(
            llm_config=LLMConfig(
                provider="azure/sp-dev-gpt5-mini", # <= our deployment name with "azure" in front as per the other bugs referencing this (also tried without "azure")
                api_token="our-api-token",
                base_url="our-api-endpoint"
            ),
            schema=PageClassification.model_json_schema(),
            extraction_type="schema",
            instruction=f"""
            #...

...but it still mistakes my Azure deployment name as an AI model name, which it should not because I'm using Azure as per the LiteLLM docs.

By enabling litellm._turn_on_debug(), I can see the error, "this model isn't mapped yet":

15:41:12 - LiteLLM:DEBUG: utils.py:4983 - Error getting model info: This model isn't mapped yet. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
15:41:12 - LiteLLM:DEBUG: main.py:892 - Error getting model info: This model isn't mapped yet. model=sp-dev-gpt5-mini, custom_llm_provider=azure. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.

Is Crawl4ai not setting the model param in the LiteLLM call to azure/our-deployment-name? Per the LiteLLM docs, we need to set:

response = completion(
    model = "azure/<your_deployment_name>", # <= maybe crawl4ai isn't setting this...?
    messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Also looked at #1248 and #174. No luck there.

@unclecode - am I missing something? Thanks :)

DeltekDavid avatar Oct 27 '25 19:10 DeltekDavid