text-embeddings-inference icon indicating copy to clipboard operation
text-embeddings-inference copied to clipboard

The "payload limit" parameter seems to have no effect?

Open 12210122 opened this issue 1 year ago • 0 comments

System Info

Thanks a lot for contributing such a great embedding framework. However, I've encountered a problem in using it and would like to ask for help! I set the payload limit to 8000000(or any number over 2M) but when my payload size exceeds 2MB (calculated using f"{sys.getsizeof(json.dumps(payload))/(1024*1024)} MB") it still prompts "Failed to buffer the request body: length limit exceeded" with 413.

I'm using TEI version 1.2.3 and Docker run commend. I use API 127.0.0.1:8080/embeddings with headers = { 'Content-Type': 'application/json', } and payload = { "input": ["sentence1", ... ,"sentence_n"], } The result of 127.0.0.1:8080/info is {'model_id': '/data/bge-m3', 'model_sha': None, 'model_dtype': 'float16', 'model_type': {'embedding': {'pooling': 'cls'}}, 'max_concurrent_requests': 512, 'max_input_length': 8192, 'max_batch_tokens': 16384, 'max_batch_requests': None, 'max_client_batch_size': 512, 'auto_truncate': False, 'tokenization_workers': 32, 'version': '1.2.3', 'docker_label': 'sha-cc1c510' }

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

The docker command I use is similar to the following: docker run -d --name text-embeddings-inference-0 --gpus 'device=0' --shm-size 1gb -p 8080:80 --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id BAAI/bge-m3 --payload-limit 8000000 --max-batch-tokens 4096000 --max-client-batch-size 512;

Expected behavior

I want to know how to set this payload limit correctly, or does this parameter need to be coordinated with other settings, such as the requester setting some parameters?

12210122 avatar May 16 '24 02:05 12210122