marshmallow_dataclasses.class_schema() fails in multithreaded app due to thread-safety issue
I came across a marshmallow-dataclass-related race condition in my new code that was resulting in the marshmallow exception when using class_schema() while marshalling and unmarshalling dataclasses from threads of a multi-threaded application:
marshmallow.exceptions.RegistryError: Class with name '{{a marshmallow data class that I defined}}' was not found. You may need to import the class
I root caused the exception to the following. My code was invoking marshmallow_dataclass.class_schema() from threads:
request = class_schema({{any one of a number of my dataclasses}})().load(…)
It turns out that class_schema() may invoke marshmallow.class_registry.register(), which is not thread-safe. It performs operations on a global dict, and marshmallow.class_registry.register() is in excess of a single GIL-protected byte code.
I have not yet come up with a simple test case, but it's easy to see via simple code inspection that marshmallow.class_registry.register() (and class_schema() as a result) is not thread-safe: https://github.com/marshmallow-code/marshmallow/blob/a651fbcb85f4a2fc1bbc9df168e66476854fe178/src/marshmallow/class_registry.py#L26-L69.
I was able to work around the race condition by forcing generation of the schema at import time by calling class_schema() on all of my dataclasses at the top level of the module. However, in a large code base with multiple contributors, it's nearly impossible to ensure that a workaround like that won't be missed and show up in production, since some of the scenarios might not trigger the race condition in the test environment, but would show up in production later.
On a related note:
- Why is generation of the Schema deferred until
.Schema()orclass_schema()()is invoked? Presumably, if someone is usingmarshmallow_dataclass.dataclass, they will be marshalling and/or unmarshalling with that class, so pre-generating the schema wouldn't be a waste. And if they are not using it for marshalling/unmarshalling, then they should just be using the builtindataclasses.dataclass, instead. - If there is not going to be a fix for this race condition, what's a full proof workaround (that's not error-prone)?
- Why is generation of the Schema deferred until
.Schema()orclass_schema()()is invoked?
Because forward references are allowed in type annotations. E.g.
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class A:
b: B
# Cannot construct schema for A yet, because type B is not yet defined.
@dataclass
class B:
x: int
I haven't looked at this in a while, but I vaguely recall that things can get even more twisted when types are defined in function-local scopes.
It turns out that class_schema() may invoke marshmallow.class_registry.register(), which is not thread-safe.
Again, I haven't looked at this for a bit, so I may be mistaken, but my gut response is:
Constructing Marshmallow schemas is not thread-safe. The metaclass for marshmallow.Schema invokes class_register.register, therefore, defining subclasses of marshmallow.Schema is not thread-safe. (That may be what you are already saying.)
That being the case, my first reaction is that we (marshmallow_dataclass) should not be expected to go to any pains to make class_schema() (our construction process for creating Marshmallow schema classes) any more thread-safe than Marshmallow is. If creating schema classes in Marshmallow is not thread-safe, it should not be surprising that using marshmallow_dataclass to construct Marshmallow schema classes is similarly un-thread-safe. (Creating instances of Marshmallow schema classes is thread-safe. At least, it should be.)
Thank you for following up, Jeff. One part of the challenge is that none of this is documented as being not thread-safe, so we stumble on the problem if we’re lucky enough to catch it during dev and testing, and spend non-trivial time troubleshooting it.
The other part is that there is no recommended workaround. What would you recommend as a safe way to use marshmallow data class in a multithreaded environment?
Finally, you mention elsewhere that one needs to access .Schema attribute of the data class in order to avoid memory leaks. Is calling marshmallow_dataclass.class_schema() on the dataclass equivalent to accessing its .Schema attribute for this purpose?
Many thanks, Vitaly
On Mon, May 5, 2025 at 2:48 AM Jeff Dairiki @.***> wrote:
dairiki left a comment (lovasoa/marshmallow_dataclass#282) https://github.com/lovasoa/marshmallow_dataclass/issues/282#issuecomment-2850064408
It turns out that class_schema() may invoke marshmallow.class_registry.register(), which is not thread-safe.
Again, I haven't looked at this for a bit, so I may be mistaken, but my gut response is:
Constructing Marshmallow schemas is not thread-safe. The metaclass for marshmallow.Schema invokes class_register.register, therefore, defining subclasses of marshmallow.Schema is not thread-safe. (That may be what you are already saying.)
That being the case, my first reaction is that we (marshmallow_dataclass) should not be expected to go to any pains to make class_schema() (our construction process for creating Marshmallow schema classes) any more thread-safe than Marshmallow is. If creating schema classes in Marshmallow is not thread-safe, it should not be surprising that using marshmallow_dataclass to construct Marshmallow schema classes is similarly un-thread-safe. (Creating instances of Marshmallow schema classes is thread-safe. At least, it should be.)
— Reply to this email directly, view it on GitHub https://github.com/lovasoa/marshmallow_dataclass/issues/282#issuecomment-2850064408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK72KW4E7JHMAISSZUGHPL244CVBAVCNFSM6AAAAAB4NBFEDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNJQGA3DINBQHA . You are receiving this because you authored the thread.Message ID: @.***>
@dairiki I'm only becoming familiar with this issue now, but I think the Marshmallow maintainers recently (~5 months ago) might be making progress towards fixing this in their experimental module https://github.com/marshmallow-code/marshmallow/pull/2707. I think first release it became available in is 4.0.0. But... I'm not yet 100% sure if the Context object they're now allowing to be a non-dict could help the class registry though since I can't tell for sure what Context is other than a project-internal context object.
Many issues involving non-threadsafety of nested Schema definitions have been closed after they merged this PR, so I'm hopeful the context object is used by the class_registry and that would mean this change would help the class_registry become thread-safe somehow?