crawl4ai
crawl4ai copied to clipboard
[Bug]: 422 Unprocessable Entity: Invalid value 'FIT' for 'f' parameter in /md endpoint called by LibreChat
crawl4ai version
0.6.0
Expected Behavior
The request generated by LibreChat with Gemini models and the /mcp/md endpoint should send the f parameter with one of the valid lowercase enum values ("raw", "fit", "bm25", or "llm"), specifically "fit" if the LLM intended that filter type, adhering to the enum definition.
Current Behavior
Description:
When the Crawl4AI /mcp/md endpoint is called, specifically via a system using LibreChat with Gemini models for tool interaction, a 422 Unprocessable Entity error is consistently returned. The error details indicate a validation failure for the f parameter in the request body, stating that the provided input value "FIT" is invalid.
Error Message:
{
"error": 422,
"detail": "{\"detail\":[{\"type\":\"enum\",\"loc\":[\"body\",\"f\"],\"msg\":\"Input should be 'raw', 'fit', 'bm25' or 'llm'\",\"input\":\"FIT\",\"ctx\":{\"expected\":\"'raw', 'fit', 'bm25' or 'llm'\"}}]"
}
Relevant Schema Information:
According to the server's public schema document (accessible, for example, at http://localhost:11235/mcp/schema), the definition for the f parameter within the md tool's inputSchema is as follows:
"f": {
"$ref": "#/$defs/FilterType",
"default": "fit",
"description": "Contentāfilter strategy: FIT, RAW, BM25, or LLM"
},
...
"$defs": {
"FilterType": {
"enum": [
"raw",
"fit",
"bm25",
"llm"
],
"title": "FilterType",
"type": "string"
}
}
This schema clearly defines the f parameter as requiring one of the lowercase string values from the FilterType enum: "raw", "fit", "bm25", or "llm".
Problem:
The error message confirms that the server is strictly validating the f parameter against the lowercase values defined in the enum. The request generated by the LibreChat/Gemini setup is sending the value "FIT" (uppercase), which does not match any of the required lowercase enum values.
This indicates that either the LibreChat tool calling logic or the Gemini model's interpretation of the schema is generating the incorrect parameter value case.
Potential Cause of LLM Behavior:
The schema itself presents the valid options for the f parameter in two slightly conflicting ways regarding case:
- The strict
enumdefinition requires lowercase ("raw","fit", etc.). - The human-readable
descriptionprovides the options in uppercase ("Contentāfilter strategy: FIT, RAW, BM25, or LLM").
It is highly probable that the LLM, in processing the tool schema description to determine the correct parameter value, picked up the uppercase format provided in the description field, leading it to generate "FIT" instead of the strictly required "fit" from the enum. The server's validation, correctly adhering to the enum definition, rejects this uppercase value.
Potential Solution:
Consider updating the schema's description field for the f parameter to also use lowercase ("Contentāfilter strategy: raw, fit, bm25, or llm") to remove the ambiguity that might confuse LLMs. https://github.com/unclecode/crawl4ai/blob/2140d9aca499580328886b0edfb0d6c4e502d4fd/deploy/docker/server.py#L225
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
1. Set up LibreChat to use Gemini models and configure it to use the Crawl4AI MCP tool.
2. Initiate a conversation or task in LibreChat that triggers the use of the `/mcp/md` endpoint with the `f` parameter set to what the LLM interprets as the "fit" strategy (e.g., asking it to summarize or extract content using the standard filter).
3. Observe the server logs for the Crawl4AI service or the error response in LibreChat.
Code snippets
OS
Linux
Python version
3.12
Browser
Vivaldi
Browser version
No response
Error logs & Screenshots (if applicable)
{
"error": 422,
"detail": "{\"detail\":[{\"type\":\"enum\",\"loc\":[\"body\",\"f\"],\"msg\":\"Input should be 'raw', 'fit', 'bm25' or 'llm'\",\"input\":\"FIT\",\"ctx\":{\"expected\":\"'raw', 'fit', 'bm25' or 'llm'\"}}]}"
}
@Dual-0 Hi! Iāve resolved the issue. The fix is now available in the 2025-MAY-2 branch. Could you help test it, too?
@ntohidi 2025-MAY-2 work for me
installation log
librechat@LLM:~/crawl4ai$ podman build --build-arg GITHUB_BRANCH=2025-MAY-2 --build-arg USE_LOCAL=true -t crawl4ai:may2
...
librechat@LLM:~/crawl4ai$ podman run -d --name crawl4ai-test --pod crawl4ai --shm-size 1g --env-file /var/lib/craw
l4ai/.env -v /var/lib/crawl4ai/config.yaml:/app/config.yml:Z localhost/crawl4ai:may2
e64c1bd6e79101f1c42d2c07eb33f4ace5bc0df7a477031d58806a25f582bf00
librechat@LLM:~/crawl4ai$ podman logs crawl4ai-test
2025-05-20 18:50:16,255 INFO supervisord started with pid 1
2025-05-20 18:50:17,259 INFO spawned: 'redis' with pid 2
2025-05-20 18:50:17,262 INFO spawned: 'gunicorn' with pid 3
2:C 20 May 2025 18:50:17.274 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2:C 20 May 2025 18:50:17.275 # Redis version=7.0.15, bits=64, commit=00000000, modified=0, pid=2, just started
2:C 20 May 2025 18:50:17.275 # Configuration loaded
2:M 20 May 2025 18:50:17.275 * monotonic clock: POSIX clock_gettime
2:M 20 May 2025 18:50:17.276 * Running mode=standalone, port=6379.
2:M 20 May 2025 18:50:17.276 # Server initialized
2:M 20 May 2025 18:50:17.276 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2:M 20 May 2025 18:50:17.281 * Ready to accept connections
[2025-05-20 18:50:17 +0000] [3] [INFO] Starting gunicorn 23.0.0
[2025-05-20 18:50:17 +0000] [3] [INFO] Listening at: http://0.0.0.0:11235 (3)
[2025-05-20 18:50:17 +0000] [3] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2025-05-20 18:50:17 +0000] [9] [INFO] Booting worker with pid: 9
2025-05-20 18:50:18,945 INFO success: redis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-05-20 18:50:18,945 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
MCP server running on 0.0.0.0:11235
[2025-05-20 18:50:19 +0000] [9] [INFO] Started server process [9]
[2025-05-20 18:50:19 +0000] [9] [INFO] Waiting for application startup.
[INIT].... ā Crawl4AI 0.6.3
[2025-05-20 18:50:20 +0000] [9] [INFO] Application startup complete.