crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Doubt

Open premChezuba opened this issue 1 year ago • 1 comments

How can we make it run with AZURE_OPENAI_API_KEY instead of OPENAI_API_KEY

premChezuba avatar Oct 22 '24 07:10 premChezuba

Since it is using LiteLLM in the background, you can try using the variables which lileLLM accepts.

For env variables:

os.environ["AZURE_API_KEY"] = "" # "my-azure-api-key"
os.environ["AZURE_API_BASE"] = "" # "https://example-endpoint.openai.azure.com"
os.environ["AZURE_API_VERSION"] = "" # "2023-05-15"

For model:

 model = "azure/<your_deployment_name>",

syed-al avatar Oct 23 '24 03:10 syed-al

Let's say you already created a deployment with name gpt-4o-mini, and the following code is the way that you can pass. In this example, we define a simple knowledge graph and we want to turn the content of one of the Paul Graham essays and extract a knowledge graph out of it. I assume you have already set your Azure API key and base and API version and there's nothing much left to do. You can simply pass the following code.

os.environ["AZURE_API_KEY"] = "" # "my-azure-api-key"
os.environ["AZURE_API_BASE"] = "" # "https://example-endpoint.openai.azure.com"
os.environ["AZURE_API_VERSION"] = "" # "2023-05-15"

async def main():
    class Entity(BaseModel):
        name: str
        description: str
        
    class Relationship(BaseModel):
        entity1: Entity
        entity2: Entity
        description: str
        relation_type: str

    class KnowledgeGraph(BaseModel):
        entities: List[Entity]
        relationships: List[Relationship]

    extraction_strategy = LLMExtractionStrategy(
            provider = "azure/gpt-4o-mini", 
            api_base=os.environ["AZURE_API_BASE"],
            api_token=os.environ["AZURE_API_KEY"],
            schema=KnowledgeGraph.model_json_schema(),
            extraction_type="schema",
            instruction="""Extract entities and relationships from the given text."""
    )
    async with AsyncWebCrawler() as crawler:
        url = "https://paulgraham.com/love.html"
        result = await crawler.arun(
            url=url,
            bypass_cache=True,
            extraction_strategy=extraction_strategy,
        )
        # print(result.extracted_content)
        with open(os.path.join(__data__, "kb_test.json"), "w") as f:
            f.write(result.extracted_content)

    print("Done")

unclecode avatar Oct 24 '24 10:10 unclecode

Thanks it worked :)

premChezuba avatar Oct 24 '24 12:10 premChezuba