pydantic
pydantic copied to clipboard
Updating a pydantic dict updates the ORM-Model too when using from_orm()
Checks
- [x] I added a descriptive title to this issue
- [x] I have searched (google, github) for similar issues and couldn't find anything
- [x] I have read and followed the docs and still think this is a bug
Bug
Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())"
:
poetry run python -c "import pydantic.utils; print(pydantic.utils.version_info())
dquote> "
/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
pydantic version: 1.9.1
pydantic compiled: True
install path: /...../.venv/lib/python3.9/site-packages/pydantic
python version: 3.9.13 (main, May 24 2022, 21:28:31) [Clang 13.1.6 (clang-1316.0.21.2)]
platform: macOS-12.4-x86_64-i386-64bit
optional deps. installed: ['typing-extensions']
from typing import Any
from pydantic import BaseModel as BaseSchema
from pydantic import Extra
class ORMModel:
def __init__(self, mystr: str, dict: dict[str, Any]) -> None:
self.mystr = mystr
self.mydict = dict
class OrmSchema(BaseSchema):
class Config:
orm_mode = True
anystr_strip_whitespace = True
extra = Extra.ignore
underscore_attrs_are_private = True
mystr: str
mydict: dict
mymodel = ORMModel("foobar", {"foo": "bar"})
myschema = OrmSchema.from_orm(mymodel)
myschema.mystr = "barfoo"
myschema.mydict.update({"hello": "you"})
print(myschema.mystr) # barfoo = OK
print(mymodel.mystr) # foobar = OK
print(myschema.mydict) # {'foo': 'bar', 'hello': 'you'} = OK
print(mymodel.mydict) # {'foo': 'bar', 'hello': 'you'} = UNEXPECTED
Interestingly, when using a Dict
instead of a dict
, it works as expected
from typing import Any, Dict
from pydantic import BaseModel as BaseSchema
from pydantic import Extra
class ORMModel:
def __init__(self, mystr: str, dict: dict[str, Any]) -> None:
self.mystr = mystr
self.mydict = dict
class OrmSchema(BaseSchema):
class Config:
orm_mode = True
anystr_strip_whitespace = True
extra = Extra.ignore
underscore_attrs_are_private = True
mystr: str
mydict: Dict
mymodel = ORMModel("foobar", {"foo": "bar"})
myschema = OrmSchema.from_orm(mymodel)
myschema.mystr = "barfoo"
myschema.mydict.update({"hello": "you"})
print(myschema.mystr) # barfoo = OK
print(mymodel.mystr) # foobar = OK
print(myschema.mydict) # {'foo': 'bar', 'hello': 'you'} = OK
print(mymodel.mydict) # {'foo': 'bar'} = OK
The reason for this behavior is the way fields are validated here https://github.com/samuelcolvin/pydantic/blob/master/pydantic/fields.py#L825
dict type is inferred as SHAPE_SINGLETON
, and validate_singleton simply uses the value as is and not copying it, while Dict as inferred as SHAPE_DICT
, and _validate_mapping_like
does copy the dict field by field
I'm not sure if this is intended or not,
@samuelcolvin WDYT? is this actually a bug with pydantic starting python3.9 where standard collections can now be used as type hints?
should _type_analysis
function be updated to detect dict
as SHAPE_DICT
as well? (and if so then of course list and tuple as well)
We should probably be consistent with dict
and Dict
, but more generally pydantic can't really make any guarantee about copying everything so editing a model could never modify it's input data, without a massive performance impact.
The only way to be sure you're copying everything is to use deepcopy
(or pydantic's smart_deepcopy
which is sometimes significantly faster).
Example:
from typing import Any, List
from pydantic import BaseModel
class Foo(BaseModel):
x: List[Any]
d = {'v': 1}
f = Foo(x=[d])
f.x[0]['another'] = 2
print(d)
#> {'v': 1, 'another': 2}
This isn't going to change since copying everything every time would be much slower, I think the best solution would be to document that if you need to be sure you modify an input by modifying the output, use Foo(**deepcopy(data))
(or Foo.from_orm(deepcopy(data))
)
We should probably be consistent with dict and Dict, but more generally pydantic can't really make any guarantee about copying everything so editing a model could never modify it's input data, without a massive performance impact.
This sounds like unexpected behaviour with possible security consequences.
I think Pydantic should default to the most secure behaviour (deepcopy), and a faster but more naive implementation opt-in and properly documented.
I understand why you might think that, but in the vast majority of situations it's not what people want and would be unnecessarily slow.
Example
data = yaml_or_toml_or_json_or_pickle_or_whatever.load(raw_input_data_bytes)
m = MyModel(**data)
Here we definitely do not want MyModel
to perform deepcopy
.
Without any statistics, I would guess this is >99% of usage, I'm not adding deepcopy default true (configurable or not) just to avoid the possibility of a bug in <1% of cases.
I understand why you might think that, but in the vast majority of situations it's not what people want
To be pedantic, I think in the vast majority of cases people would not expect this behaviour.
Perhaps in most cases this unintended side-effect does not have a (security) impact. But where it does, it will be a confusing thing to debug.
My $0.02 is to make this behaviour explicit to the programmer. E.g. via a mandatory flag if anything.
Should this ticket remain open or shall it be closed as a Wont Fix?
We’re no longer actively developing Pydantic V1 (although it will continue to receive security fixes for the next year or so), so we’re closing issues solely related to V1. If you really think this shouldn’t be closed, please comment or create a new issue 🚀.