SSL handshake timed out and channel closed exceptions
The simplified code
import os
from elasticsearch import AsyncElasticsearch
from fastapi import FastAPI, HTTPException, Depends, status
import secrets
from fastapi.security import HTTPBasic, HTTPBasicCredentials
import uvicorn
application = FastAPI()
security = HTTPBasic()
es_client = AsyncElasticsearch([os.getenv('ES_URI')] , maxsize=10000)
es_index = os.getenv('ES_INDEX')
es_doc_type = os.getenv('ES_DOC_TYPE')
def authorize(credentials: HTTPBasicCredentials = Depends(security)):
correct_username = secrets.compare_digest(credentials.username, os.getenv("BASIC_AUTH_USERNAME"))
correct_password = secrets.compare_digest(credentials.password, os.getenv("BASIC_AUTH_PASSWORD"))
if not (correct_username and correct_password):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect username or password",
headers={"WWW-Authenticate": "Basic"},
)
return True
query = {
'query': {
'function_score': {
'query': {
'bool': ...
}
},
'score_mode': 'sum', 'boost_mode': 'sum'
},
'size': 6,
'_source': [...]
}
@application.get("/api/v1/smart-search", dependencies=[Depends(authorize)])
async def search(size:int=12, model:str=None, carSize:int=None):
...
es_res = await es_client.search(index=es_index, doc_type=es_doc_type, body= query, max_concurrent_shard_requests=5000)
return es_res
if __name__ == "__main__":
uvicorn.run(application, host="0.0.0.0", port=5000)
Description
The above code is simplified and creates a dynamic ES query. Executes the ES query using AsyncElasticsearch. The result is parsed and returned. I tried multiple configuration for amongst others:
- maxsize=10000 in AsyncElasticsearch client definition
- max_concurrent_shard_requests in search
- etc.
The application runs in docker. This is the dockerfile CMD:
CMD gunicorn --do-handshake-on-connect --worker-class uvicorn.workers.UvicornWorker --worker-connections 1000 --workers 3 --bind 0.0.0.0:5000 api
I tried running with:
- uvicorn standalone and gunicorn with uvicorn worker as worker class.
- Different number of workers, worker-connections, etc.
- Without and with do-handshake-on-connect
All this runs fine for:
- Local setup (Run API local in docker with local ES instance) with as many concurrent users as wanted
- API running in Kubernetes environment with (2 to 5 pods) and AWS c5.xlarge.elasticsearch ES cluster up to 60 concurrent users
The API starts failing when running in Kubernetes environment with (2 to 5 pods) and AWS c5.xlarge.elasticsearch ES cluster starting for about 70 concurrent users or more. Load generated on API (pods) is not too high and load on ES is low.
Then 70% of the calls fails with the mainly following errors:
- SSLException: handshake timed out (59%)
- ClosedChannelException (36%)
I don't have more information. Because this is all that is by Gatling load testing tool.
Environment
requirements.txt
elasticsearch[async]>=7.12.0
starlette==0.13.6
elastic-apm==5.8.1
fastapi==0.60.1
uvicorn==0.11.8
gunicorn==20.1.0
- Python version:
Version: 3.6
FROM python:3.6-alpine
I'm not familiar with do-handshake-on-connect in gunicorn but from reading their docs it seems it's passed down the wrap_socket method, s there's something I dont get : why dont you have certs / keyfile ?
Do you have the same behaviour without this flag set ?
In any case a minimal reproducible example would help, without I doubt we can get to the bottom of it
@euri10
- The behavior is the same with and without the
do-handshake-on-connect. - The application runs in Kubernetes where the certs stuff is handled.
For now we spin up multiple instance of the service to bypass the issue.
Unfortunately I cannot share the connection string to Elasticsearch as this contains proprietary information.
you dont have at least the full traceback ? it's rather hard to get a feel as to what's happening having half the picture.
I'm mentioning the above PR just in case @SG87 , it might be totally unrelated but
- I'm closing the transport in the PR above in order to fix a "not properly closed" ressource warning in the ssl tests,
- your issue looks related in the sense above a certain level of concurrency you have ssl failures,
Hope that makes sense :dromedary_camel:
Closing this as stale. Feel free to open a new issue with an MRE. :pray: