crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: "Example: Building a Knowledge Graph" produces error contents

Open mattrossman opened this issue 6 months ago • 0 comments

crawl4ai version

0.6.3

Expected Behavior

I expect that running the script in docs Example: Building a Knowledge Graph will produce a kb_result.json file with knowledge graph data.

Current Behavior

First there's an incorrectly cased LlmConfig that I have to change to LLMConfig and add an import for.

- llmConfig = LlmConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')),
+ llmConfig = LLMConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')),

Then the script produces runtime error from LiteLLM

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

When enabling that debug mode, I see an error from OpenAI about context being too large (the example script uses openai/gpt-4):

SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.1, 'max_tokens': 1500, 'extra_body': {}}
openai.py: Received openai error - Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 12283 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
RAW RESPONSE:
Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 12283 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

After changing the model to openai/gpt-4o the script runs but I the resulting kb_result.json file contains a bunch of entries like:

[
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    },
   //  ...same content for most other items

I see a few chunks that contain valid "entities" and "relationships", but most other web pages I try produce similar error outputs on most chunks.

The output from extracting knowledge graph from https://python.org for instance is:

[
    {
        "index": 2,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    },
    {
        "index": 1,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    },
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    }
]

Is this reproducible?

Yes

Inputs Causing the Bug

The site from the example https://www.nbcnews.com/business produces mostly error items. Other sites like https://python.org produce entirely error items.

Steps to Reproduce

Run the script from docs: https://docs.crawl4ai.com/extraction/llm-strategies/#9-example-building-a-knowledge-graph

Permalink: https://github.com/unclecode/crawl4ai/blob/897e0173618d20fea5d8952ccdbcdad0febc0fee/docs/md_v2/extraction/llm-strategies.md?plain=1#L211-L270

Code snippets


OS

MacOS 14.5

Python version

3.13

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

mattrossman avatar May 22 '25 15:05 mattrossman