gunicorn
gunicorn copied to clipboard
Torch with Gunicorn + Flask API performance issue on Docker
I use Gunicorn as web server with flask api and I have performance issue compare with using Waitress as web server with flask when I try to calculate matrix multiplication wth numpy there's no huge different in response time between Gunicorn and Waitress
Numpy API
@app.route('/numpy')
def _numpy():
matrix_a = np.random.rand(640, 640, 3)
count = 0
while count < 240:
matrix_a = (matrix_a**2) % 7
count += 1
return jsonify({"message": "Hello, World!"})
But when I calculate the same operation with torch (both enable and disable torch_no_grad)
Torch API
@app.route('/torch')
def _torch():
matrix_a = torch.rand(640, 640, 3) # Create a random tensor
count = 0
while count < 240:
matrix_a = (matrix_a ** 2) % 7 # Element-wise squaring and modulo
count += 1
return jsonify({"message": "Hello, World!"})
Torch_no_grad API
@app.route('/torch_no_grad')
def _torch_ng():
with torch.no_grad():
matrix_a = torch.rand(640, 640, 3) # Create a random tensor
count = 0
while count < 240:
matrix_a = (matrix_a ** 2) % 7 # Element-wise squaring and modulo
count += 1
return jsonify({"message": "Hello, World!"})
there is a huge difference in response time
limits:
memory: 1g
cpus: '8.0'
numpy
----------
waitress: Mean=1.1698s, Std=0.0300s
gunicorn: Mean=1.1715s, Std=0.0311s
torch
----------
waitress: Mean=0.9230s, Std=0.1078s
gunicorn: Mean=0.8869s, Std=0.1190s
torch_no_grad
----------
waitress: Mean=0.9172s, Std=0.1058s
gunicorn: Mean=0.8886s, Std=0.1126s
limits:
memory: 1g
cpus: '4.0'
numpy
----------
waitress: Mean=1.1876s, Std=0.0407s
gunicorn: Mean=1.1897s, Std=0.0390s
torch
----------
waitress: Mean=0.9502s, Std=0.1281s
gunicorn: Mean=0.9180s, Std=0.1288s
torch_no_grad
----------
waitress: Mean=0.9119s, Std=0.1063s
gunicorn: Mean=0.8678s, Std=0.1105s
limits:
memory: 1g
cpus: '2.0'
numpy
----------
waitress: Mean=1.1881s, Std=0.0494s
gunicorn: Mean=1.1835s, Std=0.0424s
torch
----------
waitress: Mean=0.7837s, Std=0.1328s
gunicorn: Mean=1.3097s, Std=0.0544s
torch_no_grad
----------
waitress: Mean=0.7932s, Std=0.0988s
gunicorn: Mean=1.3300s, Std=0.1083s
I evaluate this on machine spec: Macbook Air m2 ram16
this is api that send request to Gunicorn and Waitress
import asyncio
import httpx
import time
from collections import defaultdict
import numpy as np
N = 1
url_paths = ["numpy", "torch", "torch_no_grad"]
API_URLS = [
"http://localhost:8001/",
"http://localhost:8002/",
]
API_URLS_DICT = {
"http://localhost:8001/": "waitress",
"http://localhost:8002/": "gunicorn",
}
async def fetch(client, url):
start_time = time.perf_counter() # Start timing
response = await client.get(url+url_path, timeout=20.0)
end_time = time.perf_counter() # End timing
response_time = end_time - start_time # Calculate response time
return {
"url": url,
"status": response.status_code,
"response_time": response_time,
"data": response.json()
}
async def main():
async with httpx.AsyncClient() as client:
tasks = [fetch(client, url) for url in API_URLS for _ in range(N)]
results = await asyncio.gather(*tasks)
return results
if __name__ == "__main__":
repeat_time = 5
for url_path in url_paths:
count = defaultdict(list)
print(url_path)
print('----------')
for _ in range(repeat_time):
y = asyncio.run(main())
for x in y:
count[API_URLS_DICT[x['url']]].append(x['response_time'])
for k, v in count.items():
v = np.array(v)
print(f"{k}: Mean={v.mean():.4f}s, Std={v.std():.4f}s")
print()
Thanks for the detailed report.
How did you launch the test targets? Specifically, I am inquiring about the command lines containing the localhost:8001 (resp localhost:8002) listen address. I am assuming you are testing against Gunicorn 23.0 on Python 3.11, correct?
Thanks for the detailed report.
How did you launch the test targets? Specifically, I am inquiring about the command lines containing the
localhost:8001(resplocalhost:8002) listen address. I am assuming you are testing against Gunicorn23.0on Python3.11, correct?
python version is 3.10, here is Dockerfile
# Use official Python image
FROM python:3.10
# Set the working directory
WORKDIR /app
# Copy the application files
COPY app.py requirements.txt ./
# Install dependencies
RUN pip install -r requirements.txt
# Install curl for health check
RUN apt-get update && apt-get install -y curl
# Expose port 8002
EXPOSE 8002
# Run the app with Gunicorn (use default worker count)
CMD ["gunicorn", "-b", "0.0.0.0:8002", "app:app"]
note: there is no different with or without health check in performance
did you try the thread worker?