crawl4ai [Bug]: "Example: Building a Knowledge Graph" produces error contents

[Bug]: "Example: Building a Knowledge Graph" produces error contents

Open mattrossman opened this issue 6 months ago • 0 comments

crawl4ai version

0.6.3

Expected Behavior

I expect that running the script in docs Example: Building a Knowledge Graph will produce a kb_result.json file with knowledge graph data.

Current Behavior

First there's an incorrectly cased LlmConfig that I have to change to LLMConfig and add an import for.

- llmConfig = LlmConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')),
+ llmConfig = LLMConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')),

Then the script produces runtime error from LiteLLM

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

When enabling that debug mode, I see an error from OpenAI about context being too large (the example script uses openai/gpt-4):

SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.1, 'max_tokens': 1500, 'extra_body': {}}
openai.py: Received openai error - Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 12283 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
RAW RESPONSE:
Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 12283 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

After changing the model to openai/gpt-4o the script runs but I the resulting kb_result.json file contains a bunch of entries like:

[
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    },
   //  ...same content for most other items

I see a few chunks that contain valid "entities" and "relationships", but most other web pages I try produce similar error outputs on most chunks.

The output from extracting knowledge graph from https://python.org for instance is:

[
    {
        "index": 2,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    },
    {
        "index": 1,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    },
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "'str' object has no attribute 'choices'"
    }
]

Is this reproducible?

Yes

Inputs Causing the Bug

The site from the example https://www.nbcnews.com/business produces mostly error items. Other sites like https://python.org produce entirely error items.

Steps to Reproduce

Run the script from docs: https://docs.crawl4ai.com/extraction/llm-strategies/#9-example-building-a-knowledge-graph

Permalink: https://github.com/unclecode/crawl4ai/blob/897e0173618d20fea5d8952ccdbcdad0febc0fee/docs/md_v2/extraction/llm-strategies.md?plain=1#L211-L270

Code snippets

OS

MacOS 14.5

Python version

3.13

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

May 22 '25 15:05 mattrossman

crawl4ai crawl4ai copied to clipboard

[Bug]: "Example: Building a Knowledge Graph" produces error contents

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

crawl4ai
crawl4ai copied to clipboard