flask-restx
flask-restx copied to clipboard
Swagger schema creation can crash if multiple requests arrive quickly on startup [theory]
Hello flask-restx team!
This is a bit of a nasty one sorry! We have recently twice observed a crash (call stack below) inside the Swagger() constructor on application startup, when it receives its first request. The exception being thrown ("dictionary changed size during iteration") is indicative of a threading issue where there are multiple threads concurrently trying to construct a Swagger() object, which is assigned to a cached property on the Api class when the first request that requires validation arrives (or when the swagger-ui url is loaded). As there are no locks and no threads in flask-restx, it appears that the Swagger() constructor is not thread-safe, and if multiple requests arrive very quickly at application startup (and flask is running with threaded=True), it is possible that data corruption and crashes can happen during schema rendering. Please note this is just my theory on root cause, and I'm submitting this issue to hear from anyone else in case I've assumed wrong. The crash randomly happens (we've seen it twice in the last week), and despite trying, I have so far not found a way to reproduce it unfortunately.
As for a fix, it would seem that a lock should be used to guarantee thread-safety of the Swagger() constructor. I would be happy to work on a PR for that if advised by flask-restx maintainers.
Code
Happy to provide, in particular the model definitions we use, if it helps, but as this is largish application and the call stack indicates a non-reproducible threading condition, my thought is that the root cause is not directly related to our model definitions. So I initially wanted to seek advice on course of action based on the call stack and my interpretation. We do have Nested fields, but only a single level of nesting.
Repro Steps (if applicable)
Sorry, not known.
Expected Behavior
If multiple requests reach the server quickly on startup, schema creation should be synchronized to ensure it is created before any request is processed.
Actual Behavior
If schema creation fails, the application continues to run, but requests that expect validation using can crash during validation when schema is referenced, indicative of corrupt/incomplete schema, for example, we see this:
Traceback (most recent call last): File "/home/app/.local/lib/python3.8/site-packages/jsonschema/validators.py", line 966, in resolve_fragment document = document[part] KeyError: 'definitions'
Error Messages/Stack Trace
2023-05-29 11:52:47,766 ERROR T140221658154752 [api.schema] Unable to render schema
Traceback (most recent call last):
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/api.py", line 573, in schema
self._schema = Swagger(self).as_dict()
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 275, in as_dict
serialized = self.serialize_resource(
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 482, in serialize_resource
path[method] = self.serialize_operation(doc, method)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 488, in serialize_operation
"responses": self.responses_for(doc, method) or None,
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 622, in responses_for
responses[code]["schema"] = self.serialize_schema(d["model"])
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 672, in serialize_schema
self.register_model(model)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 703, in register_model
self.register_field(field)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 713, in register_field
self.register_field(field.container)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 711, in register_field
self.register_model(field.nested)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/fields.py", line 261, in nested
return getattr(self.model, "resolved", self.model)
File "/home/app/.local/lib/python3.8/site-packages/werkzeug/utils.py", line 109, in get
value = self.fget(obj) # type: ignore
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/model.py", line 176, in resolved
resolved = copy.deepcopy(self)
File "/usr/local/lib/python3.8/copy.py", line 153, in deepcopy
y = copier(memo)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/model.py", line 236, in deepcopy
[(key, copy.deepcopy(value, memo)) for key, value in self.items()],
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/model.py", line 236, in
^^^ Note the multiple requests arriving on different theads within the same second as the crash, logged after the call stack ^^^
Environment
- Python version 3.8.10
- Flask version 2.0.3
- Flask-RESTX version 1.0.6
- Other installed Flask extensions (none)
Thanks for your time.
@peterhorsley How is this application being deployed? I suspect you are probably correct in that flask-restx is not designed to be thread safe! However, I have a production application deployed on AWS EB with gunicorn and I have never seen this issue on scaling, so I'm wondering is it related to the flask development server.
@peter-doggart we can reproduce in both production and dev environments. our production environment is deployed in docker containers in aws k8s using apache, specifically using the python mod-wsgi package. we can also reproduce using flask dev server by using locust to hammer the server with requests on startup. for now we have implemented a workaround by adding a global lock to flask's @app.before_request method forcing generation of the schema by accessing the internal schema attribute, like this:
@app.before_request
def before_request():
with MyApp.request_lock:
if not MyApp.schema_generated:
logging.info(f'Generating swagger spec for api')
json.dumps(MyApp.api.__schema__) # <-- Force flask-restx schema to be generated
MyApp.schema_generated = True
But of course would be better to fix in flask-restx so this is not needed.
im facing off this problem too, the problem is my schema is partially generated so i dont know if i can do the same approach of solution mentioned by @peterhorsley