crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: 422 Unprocessable Entity: Invalid value 'FIT' for 'f' parameter in /md endpoint called by LibreChat

Open Dual-0 opened this issue 6 months ago • 1 comments
trafficstars

crawl4ai version

0.6.0

Expected Behavior

The request generated by LibreChat with Gemini models and the /mcp/md endpoint should send the f parameter with one of the valid lowercase enum values ("raw", "fit", "bm25", or "llm"), specifically "fit" if the LLM intended that filter type, adhering to the enum definition.

Current Behavior

Description:

When the Crawl4AI /mcp/md endpoint is called, specifically via a system using LibreChat with Gemini models for tool interaction, a 422 Unprocessable Entity error is consistently returned. The error details indicate a validation failure for the f parameter in the request body, stating that the provided input value "FIT" is invalid.

Error Message:

{
  "error": 422,
  "detail": "{\"detail\":[{\"type\":\"enum\",\"loc\":[\"body\",\"f\"],\"msg\":\"Input should be 'raw', 'fit', 'bm25' or 'llm'\",\"input\":\"FIT\",\"ctx\":{\"expected\":\"'raw', 'fit', 'bm25' or 'llm'\"}}]"
}

Relevant Schema Information:

According to the server's public schema document (accessible, for example, at http://localhost:11235/mcp/schema), the definition for the f parameter within the md tool's inputSchema is as follows:

"f": {
  "$ref": "#/$defs/FilterType",
  "default": "fit",
  "description": "Content‑filter strategy: FIT, RAW, BM25, or LLM"
},
...
"$defs": {
  "FilterType": {
    "enum": [
      "raw",
      "fit",
      "bm25",
      "llm"
    ],
    "title": "FilterType",
    "type": "string"
  }
}

This schema clearly defines the f parameter as requiring one of the lowercase string values from the FilterType enum: "raw", "fit", "bm25", or "llm".

Problem:

The error message confirms that the server is strictly validating the f parameter against the lowercase values defined in the enum. The request generated by the LibreChat/Gemini setup is sending the value "FIT" (uppercase), which does not match any of the required lowercase enum values.

This indicates that either the LibreChat tool calling logic or the Gemini model's interpretation of the schema is generating the incorrect parameter value case.

Potential Cause of LLM Behavior:

The schema itself presents the valid options for the f parameter in two slightly conflicting ways regarding case:

  1. The strict enum definition requires lowercase ("raw", "fit", etc.).
  2. The human-readable description provides the options in uppercase ("Content‑filter strategy: FIT, RAW, BM25, or LLM").

It is highly probable that the LLM, in processing the tool schema description to determine the correct parameter value, picked up the uppercase format provided in the description field, leading it to generate "FIT" instead of the strictly required "fit" from the enum. The server's validation, correctly adhering to the enum definition, rejects this uppercase value.

Potential Solution:

Consider updating the schema's description field for the f parameter to also use lowercase ("Content‑filter strategy: raw, fit, bm25, or llm") to remove the ambiguity that might confuse LLMs. https://github.com/unclecode/crawl4ai/blob/2140d9aca499580328886b0edfb0d6c4e502d4fd/deploy/docker/server.py#L225

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce

1.  Set up LibreChat to use Gemini models and configure it to use the Crawl4AI MCP tool.
2.  Initiate a conversation or task in LibreChat that triggers the use of the `/mcp/md` endpoint with the `f` parameter set to what the LLM interprets as the "fit" strategy (e.g., asking it to summarize or extract content using the standard filter).
3.  Observe the server logs for the Crawl4AI service or the error response in LibreChat.

Code snippets


OS

Linux

Python version

3.12

Browser

Vivaldi

Browser version

No response

Error logs & Screenshots (if applicable)

{
  "error": 422,
  "detail": "{\"detail\":[{\"type\":\"enum\",\"loc\":[\"body\",\"f\"],\"msg\":\"Input should be 'raw', 'fit', 'bm25' or 'llm'\",\"input\":\"FIT\",\"ctx\":{\"expected\":\"'raw', 'fit', 'bm25' or 'llm'\"}}]}"
}

Dual-0 avatar May 03 '25 22:05 Dual-0

@Dual-0 Hi! I’ve resolved the issue. The fix is now available in the 2025-MAY-2 branch. Could you help test it, too?

ntohidi avatar May 16 '25 07:05 ntohidi

@ntohidi 2025-MAY-2 work for me

Image

installation log

librechat@LLM:~/crawl4ai$ podman build --build-arg GITHUB_BRANCH=2025-MAY-2 --build-arg USE_LOCAL=true -t crawl4ai:may2
...
librechat@LLM:~/crawl4ai$ podman run -d --name crawl4ai-test   --pod crawl4ai   --shm-size 1g   --env-file /var/lib/craw
l4ai/.env   -v /var/lib/crawl4ai/config.yaml:/app/config.yml:Z   localhost/crawl4ai:may2
e64c1bd6e79101f1c42d2c07eb33f4ace5bc0df7a477031d58806a25f582bf00
librechat@LLM:~/crawl4ai$ podman logs crawl4ai-test
2025-05-20 18:50:16,255 INFO supervisord started with pid 1
2025-05-20 18:50:17,259 INFO spawned: 'redis' with pid 2
2025-05-20 18:50:17,262 INFO spawned: 'gunicorn' with pid 3
2:C 20 May 2025 18:50:17.274 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2:C 20 May 2025 18:50:17.275 # Redis version=7.0.15, bits=64, commit=00000000, modified=0, pid=2, just started
2:C 20 May 2025 18:50:17.275 # Configuration loaded
2:M 20 May 2025 18:50:17.275 * monotonic clock: POSIX clock_gettime
2:M 20 May 2025 18:50:17.276 * Running mode=standalone, port=6379.
2:M 20 May 2025 18:50:17.276 # Server initialized
2:M 20 May 2025 18:50:17.276 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2:M 20 May 2025 18:50:17.281 * Ready to accept connections
[2025-05-20 18:50:17 +0000] [3] [INFO] Starting gunicorn 23.0.0
[2025-05-20 18:50:17 +0000] [3] [INFO] Listening at: http://0.0.0.0:11235 (3)
[2025-05-20 18:50:17 +0000] [3] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2025-05-20 18:50:17 +0000] [9] [INFO] Booting worker with pid: 9
2025-05-20 18:50:18,945 INFO success: redis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-05-20 18:50:18,945 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
MCP server running on 0.0.0.0:11235
[2025-05-20 18:50:19 +0000] [9] [INFO] Started server process [9]
[2025-05-20 18:50:19 +0000] [9] [INFO] Waiting for application startup.
[INIT].... → Crawl4AI 0.6.3
[2025-05-20 18:50:20 +0000] [9] [INFO] Application startup complete.

Dual-0 avatar May 20 '25 19:05 Dual-0