crawl4ai
crawl4ai copied to clipboard
[Bug]: "Example: Building a Knowledge Graph" produces error contents
crawl4ai version
0.6.3
Expected Behavior
I expect that running the script in docs Example: Building a Knowledge Graph will produce a kb_result.json file with knowledge graph data.
Current Behavior
First there's an incorrectly cased LlmConfig that I have to change to LLMConfig and add an import for.
- llmConfig = LlmConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')),
+ llmConfig = LLMConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')),
Then the script produces runtime error from LiteLLM
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.
When enabling that debug mode, I see an error from OpenAI about context being too large (the example script uses openai/gpt-4):
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.1, 'max_tokens': 1500, 'extra_body': {}}
openai.py: Received openai error - Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 12283 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
RAW RESPONSE:
Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 12283 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
After changing the model to openai/gpt-4o the script runs but I the resulting kb_result.json file contains a bunch of entries like:
[
{
"index": 0,
"error": true,
"tags": [
"error"
],
"content": "'str' object has no attribute 'choices'"
},
// ...same content for most other items
I see a few chunks that contain valid "entities" and "relationships", but most other web pages I try produce similar error outputs on most chunks.
The output from extracting knowledge graph from https://python.org for instance is:
[
{
"index": 2,
"error": true,
"tags": [
"error"
],
"content": "'str' object has no attribute 'choices'"
},
{
"index": 1,
"error": true,
"tags": [
"error"
],
"content": "'str' object has no attribute 'choices'"
},
{
"index": 0,
"error": true,
"tags": [
"error"
],
"content": "'str' object has no attribute 'choices'"
}
]
Is this reproducible?
Yes
Inputs Causing the Bug
The site from the example https://www.nbcnews.com/business produces mostly error items. Other sites like https://python.org produce entirely error items.
Steps to Reproduce
Run the script from docs: https://docs.crawl4ai.com/extraction/llm-strategies/#9-example-building-a-knowledge-graph
Permalink: https://github.com/unclecode/crawl4ai/blob/897e0173618d20fea5d8952ccdbcdad0febc0fee/docs/md_v2/extraction/llm-strategies.md?plain=1#L211-L270
Code snippets
OS
MacOS 14.5
Python version
3.13
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response