text-embeddings-inference
text-embeddings-inference copied to clipboard
The "payload limit" parameter seems to have no effect?
System Info
Thanks a lot for contributing such a great embedding framework. However, I've encountered a problem in using it and would like to ask for help! I set the payload limit to 8000000(or any number over 2M) but when my payload size exceeds 2MB (calculated using f"{sys.getsizeof(json.dumps(payload))/(1024*1024)} MB") it still prompts "Failed to buffer the request body: length limit exceeded" with 413.
I'm using TEI version 1.2.3 and Docker run commend. I use API 127.0.0.1:8080/embeddings with headers = { 'Content-Type': 'application/json', } and payload = { "input": ["sentence1", ... ,"sentence_n"], } The result of 127.0.0.1:8080/info is {'model_id': '/data/bge-m3', 'model_sha': None, 'model_dtype': 'float16', 'model_type': {'embedding': {'pooling': 'cls'}}, 'max_concurrent_requests': 512, 'max_input_length': 8192, 'max_batch_tokens': 16384, 'max_batch_requests': None, 'max_client_batch_size': 512, 'auto_truncate': False, 'tokenization_workers': 32, 'version': '1.2.3', 'docker_label': 'sha-cc1c510' }
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
The docker command I use is similar to the following: docker run -d --name text-embeddings-inference-0 --gpus 'device=0' --shm-size 1gb -p 8080:80 --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id BAAI/bge-m3 --payload-limit 8000000 --max-batch-tokens 4096000 --max-client-batch-size 512;
Expected behavior
I want to know how to set this payload limit correctly, or does this parameter need to be coordinated with other settings, such as the requester setting some parameters?