pydantic icon indicating copy to clipboard operation
pydantic copied to clipboard

order on `extra` fields not respected

Open leosussan opened this issue 5 years ago • 7 comments

Feature Request

Among many things we're using Pydantic for (thank you for this amazing project) is as an interim step for a report / data export engine. On some exports, I know the form of half of the columns & want to validate those. The other half, I don't need to validate.

I use the extra flag on that so the object is flexible enough w/o having to create a new model dynamically. What I would expect: Pydantic would take & create new fields per extra key on the class in the order that the field is received. What happens: new fields are created, but order isn't respected - see the example below.

Feels like a bug; is this the expected behavior? Given the importance that Pydantic places on field order I'd assume the answer is no but wanted to get your thoughts.

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.4
            pydantic compiled: False
                 install path: /Users/leo/.local/share/virtualenvs/services-PFQGUeYK/lib/python3.8/site-packages/pydantic
               python version: 3.8.1 (default, Dec 27 2019, 18:06:00)  [Clang 11.0.0 (clang-1100.0.33.16)]
                     platform: macOS-10.15.2-x86_64-i386-64bit
     optional deps. installed: ['email-validator']

from pydantic import BaseModel

class CSVRow(BaseModel):
    id: int
    name: str
    address: str

    class Config:
        extra = "allow"

row: dict = {"id": 1, "name": "Jon Snow", "address": "100 Winterfell", "Sword Name": "Longclaw", "Birth Year": "281 AC", "Birth Place": "Tower of Joy", "Lovers": ["Ygritte", "Danerys Targaryen"]}

validated_row: CSVRow = CSVRow(**row)

validated_row
# CSVRow(id=1, name='Jon Snow', address='100 Winterfell', Birth Year='281 AC', Birth Place='Tower of Joy', Lovers=['Ygritte', 'Danerys Targaryen'], Sword Name='Longclaw')

validated_row.dict()
# {'id': 1, 'name': 'Jon Snow', 'address': '100 Winterfell', 'Birth Year': '281 AC', 'Birth Place': 'Tower of Joy', 'Lovers': ['Ygritte', 'Danerys Targaryen'], 'Sword Name': 'Longclaw'}

validated_row.json()
# '{"id": 1, "name": "Jon Snow", "address": "100 Winterfell", "Birth Year": "281 AC", "Birth Place": "Tower of Joy", "Lovers": ["Ygritte", "Danerys Targaryen"], "Sword Name": "Longclaw"}'

leosussan avatar Feb 16 '20 10:02 leosussan

I've changed the label from bug to feature request since we don't make any guarantee about this working the other way at present.

The reason for this is:

https://github.com/samuelcolvin/pydantic/blob/645e5fe6a0c74c95c24410571e9bb804af3eb677/pydantic/main.py#L857-L869

This is the most performant way (I know of) to achieve what's currently required, however I would consider a change to maintain order.

But it would depend on the performance impact, if it's too large I'd rather avoid the change.

samuelcolvin avatar Feb 17 '20 18:02 samuelcolvin

Gotcha - I changed the header on the original issue to reflect your label change.

Would you want ordering on extra fields to be an option on config & split the difference? Or is the additional complexity not worth that consideration to you?

Given the relatively substantial performance implications of using a list vs. sets as currently implemented, by performance impact I'd imagine it would be hard to justify. unless you introduce an ordered implementation of set, a la something like https://github.com/LuminosoInsight/ordered-set, which sort of splits the difference but slows things down a little bit).

leosussan avatar Feb 17 '20 20:02 leosussan

IIUC the feature request is about key ordering at a model level. But maybe an intermediate option with the extra fields ordered after the declared ones could be implemented with no performance impact. This wouldn't resolve the feature request but it seems a valuable feature by itself and maybe a starting point.

names_used is only employed to evaluate extra. An oversimplified and far-from-reality example that would set extra before the fields iteration and modify it in place suggests that this strategy could be more performant (and less memory-consuming) while avoiding the unordering introduced by the set:

from string import ascii_letters as L

DECLARED, EXTRA = 5, 5
input_data = {L[k]: k for k in range(DECLARED + EXTRA)}
fields = [L[k] for k in range(DECLARED)]


def test1():
    # unordered extra
    names_used = set()

    for f in fields:
        names_used.add(f)

    extra = input_data.keys() - names_used


def test2():
    # ordered extra
    extra = list(input_data)

    for f in fields:
        extra.remove(f)


if __name__ == '__main__':
    import timeit
    print(timeit.timeit("test1()", setup="from __main__ import test1"))
    print(timeit.timeit("test2()", setup="from __main__ import test2"))

nuno-andre avatar Mar 30 '21 17:03 nuno-andre

+1 on this feature request. I've run into this field order problem when creating tests for a FastAPI application. Using .dict() seems to keep field order, but .json() fails regularly.

from pydantic import BaseModel


class TestInput(BaseModel):
    class config:
        extra = "allow"

class TestOutput(TestInput):
    id: str
    metadata: str


keys_input = {
    "key1": True,
    "key2": False
}
keys_output = {
    "id": "some_id",
    "key1": True,
    "key2": False,
    "metadata": "some_metadata"
}

test_input = TestInput.parse_obj(keys_input)
test_output = TestOutput.parse_obj(keys_output)

exclude_fields = {"id", "metadata"}
assert test_input.dict() == test_output.dict(exclude=exclude_fields)
assert test_input.json() == test_output.json(exclude=exclude_fields)

havardthom avatar Jun 22 '21 08:06 havardthom

Any updates on this? Did you guys found a workaround to repect the field order? I've also run into this problem building a FastAPI application

xoelop avatar Jun 20 '22 14:06 xoelop

@xoelop I've used the following as a workaround. Note that I am still using Pydantic v1.8 and haven't tested on more recent versions.

class Foo(BaseModel):
  def __init__(self, *args, **kwargs):
    # HACK: Because Pydantic does not preserve order for extra parameters (https://github.com/samuelcolvin/pydantic/issues/1234)
    #       we assign them in order after the class has been created
    # HACK #2: To do this, we temporarily set 'allow_mutation' to True, and
    #       reset it to the value in Config afterwards.
    extra_kwargs = {}
    for k in list(kwargs):
      if k not in self.__fields__:
        extra_kwargs[k] = kwargs.pop(k)

    super().__init__(*args, **kwargs)
    
    # vvvv Enable mutations vvvv
    old_allow_mutation = self.__config__.allow_mutation
    self.__config__.allow_mutation = True
    for k, v in extra_kwargs.items():
      setattr(self, k, v)
    self.__config__.allow_mutation = old_allow_mutation
    # ^^^^ Reset mutability as before ^^^^ 

alcrene avatar Jun 25 '22 21:06 alcrene

This is fixed in v2 I think. I'll add a unit test to confirm that order is maintained.

I doubt it will get changed in v1, but hopefully v2 will be released this year.

samuelcolvin avatar Jun 26 '22 07:06 samuelcolvin