pydantic icon indicating copy to clipboard operation
pydantic copied to clipboard

Updating a pydantic dict updates the ORM-Model too when using from_orm()

Open MprBol opened this issue 2 years ago • 6 comments

Checks

  • [x] I added a descriptive title to this issue
  • [x] I have searched (google, github) for similar issues and couldn't find anything
  • [x] I have read and followed the docs and still think this is a bug

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

poetry run python -c "import pydantic.utils; print(pydantic.utils.version_info())
dquote> "
/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
             pydantic version: 1.9.1
            pydantic compiled: True
                 install path: /...../.venv/lib/python3.9/site-packages/pydantic
               python version: 3.9.13 (main, May 24 2022, 21:28:31)  [Clang 13.1.6 (clang-1316.0.21.2)]
                     platform: macOS-12.4-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

from typing import Any
from pydantic import BaseModel as BaseSchema
from pydantic import Extra


class ORMModel:
    def __init__(self, mystr: str, dict: dict[str, Any]) -> None:
        self.mystr = mystr
        self.mydict = dict


class OrmSchema(BaseSchema):
    class Config:
        orm_mode = True
        anystr_strip_whitespace = True
        extra = Extra.ignore
        underscore_attrs_are_private = True

    mystr: str
    mydict: dict


mymodel = ORMModel("foobar", {"foo": "bar"})

myschema = OrmSchema.from_orm(mymodel)

myschema.mystr = "barfoo"
myschema.mydict.update({"hello": "you"})

print(myschema.mystr) # barfoo = OK
print(mymodel.mystr) # foobar = OK
print(myschema.mydict) # {'foo': 'bar', 'hello': 'you'} = OK
print(mymodel.mydict) # {'foo': 'bar', 'hello': 'you'}  = UNEXPECTED

MprBol avatar Jul 14 '22 07:07 MprBol

Interestingly, when using a Dict instead of a dict, it works as expected

from typing import Any, Dict
from pydantic import BaseModel as BaseSchema
from pydantic import Extra


class ORMModel:
    def __init__(self, mystr: str, dict: dict[str, Any]) -> None:
        self.mystr = mystr
        self.mydict = dict


class OrmSchema(BaseSchema):
    class Config:
        orm_mode = True
        anystr_strip_whitespace = True
        extra = Extra.ignore
        underscore_attrs_are_private = True

    mystr: str
    mydict: Dict


mymodel = ORMModel("foobar", {"foo": "bar"})

myschema = OrmSchema.from_orm(mymodel)

myschema.mystr = "barfoo"
myschema.mydict.update({"hello": "you"})

print(myschema.mystr) # barfoo = OK
print(mymodel.mystr) # foobar = OK
print(myschema.mydict) # {'foo': 'bar', 'hello': 'you'} = OK
print(mymodel.mydict) # {'foo': 'bar'}  = OK

MprBol avatar Jul 14 '22 07:07 MprBol

The reason for this behavior is the way fields are validated here https://github.com/samuelcolvin/pydantic/blob/master/pydantic/fields.py#L825 dict type is inferred as SHAPE_SINGLETON, and validate_singleton simply uses the value as is and not copying it, while Dict as inferred as SHAPE_DICT, and _validate_mapping_like does copy the dict field by field

I'm not sure if this is intended or not, @samuelcolvin WDYT? is this actually a bug with pydantic starting python3.9 where standard collections can now be used as type hints? should _type_analysis function be updated to detect dict as SHAPE_DICT as well? (and if so then of course list and tuple as well)

danielkatzan avatar Jul 30 '22 16:07 danielkatzan

We should probably be consistent with dict and Dict, but more generally pydantic can't really make any guarantee about copying everything so editing a model could never modify it's input data, without a massive performance impact.

The only way to be sure you're copying everything is to use deepcopy (or pydantic's smart_deepcopy which is sometimes significantly faster).

Example:

from typing import Any, List
from pydantic import BaseModel

class Foo(BaseModel):
    x: List[Any]

d = {'v': 1}
f = Foo(x=[d])
f.x[0]['another'] = 2
print(d)
#> {'v': 1, 'another': 2}

This isn't going to change since copying everything every time would be much slower, I think the best solution would be to document that if you need to be sure you modify an input by modifying the output, use Foo(**deepcopy(data)) (or Foo.from_orm(deepcopy(data)))

samuelcolvin avatar Jul 30 '22 18:07 samuelcolvin

We should probably be consistent with dict and Dict, but more generally pydantic can't really make any guarantee about copying everything so editing a model could never modify it's input data, without a massive performance impact.

This sounds like unexpected behaviour with possible security consequences.

I think Pydantic should default to the most secure behaviour (deepcopy), and a faster but more naive implementation opt-in and properly documented.

MprBol avatar Aug 08 '22 07:08 MprBol

I understand why you might think that, but in the vast majority of situations it's not what people want and would be unnecessarily slow.

Example

data = yaml_or_toml_or_json_or_pickle_or_whatever.load(raw_input_data_bytes)
m = MyModel(**data)

Here we definitely do not want MyModel to perform deepcopy.

Without any statistics, I would guess this is >99% of usage, I'm not adding deepcopy default true (configurable or not) just to avoid the possibility of a bug in <1% of cases.

samuelcolvin avatar Aug 08 '22 09:08 samuelcolvin

I understand why you might think that, but in the vast majority of situations it's not what people want

To be pedantic, I think in the vast majority of cases people would not expect this behaviour.

Perhaps in most cases this unintended side-effect does not have a (security) impact. But where it does, it will be a confusing thing to debug.

My $0.02 is to make this behaviour explicit to the programmer. E.g. via a mandatory flag if anything.

Should this ticket remain open or shall it be closed as a Wont Fix?

MprBol avatar Aug 10 '22 13:08 MprBol

We’re no longer actively developing Pydantic V1 (although it will continue to receive security fixes for the next year or so), so we’re closing issues solely related to V1. If you really think this shouldn’t be closed, please comment or create a new issue 🚀.

sydney-runkle avatar Dec 06 '23 19:12 sydney-runkle