marshmallow
marshmallow copied to clipboard
When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors
I've noticed that when running under spark issing is not skipped. It is propagated to the deserialiser/serializer which, in case of an int value explodes with:
TypeError(int() argument must be a string, a bytes-like object or a real number, not '_Missing')
Alos all the missing string fields are serialized as '<marshmallow.missing>'.
This oddly doesn't happen in a unit test, only when I execute within spark. The exception gets thrown here: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L520
See below my pip freeze:
attrs==21.4.0
iniconfig==1.1.1
marshmallow==3.15.0
marshmallow-dataclass==8.4.1
marshmallow-enum==1.5.1
marshmallow-union==0.1.15
mypy-extensions==0.4.3
packaging==21.3
pluggy==1.0.0
py==1.11.0
py4j==0.10.9
pyparsing==3.0.8
pyspark==3.1.2
pytest==6.2.5
PyYAML==6.0
toml==0.10.2
typeguard==2.13.3
typing-inspect==0.7.1
typing_extensions==4.1.1
I'm running under Python 3.10.2.