pydantic icon indicating copy to clipboard operation
pydantic copied to clipboard

BaseModel.__iter__ may be a footgun

Open davidhewitt opened this issue 2 years ago • 3 comments

Initial Checks

  • [X] I have searched Google & GitHub for similar requests and couldn't find anything
  • [X] I have read and followed the docs and still think this feature is missing

Description

See #6977

BaseModel.__iter__ exists (according to its doc) to allow dict(model). This is a handy trick but it does expose users to an accidental footgun if they're not expecting BaseModel types to behave in this way.

Perhaps we should consider deprecating / removing in V3, with the deprecation message maybe advising users to use model.__dict__?

Affected Components

Selected Assignee: @hramezani

davidhewitt avatar Aug 01 '23 17:08 davidhewitt

@davidhewitt Did we think about this?

Kludex avatar Aug 21 '23 13:08 Kludex

No, I think this needs a V3 label of some kind, and we should leave it here to discuss in V3 sometime in the future.

davidhewitt avatar Aug 22 '23 09:08 davidhewitt

Code that implements the Mapping interface breaks from v1→v2 because:

  • Pydantic warns if you don't inherit first from BaseModel: don't do MySubClass(MyMapping, BaseModel)
  • Pydantic implements __iter__ to return items, which is inconsistent and breaks Mapping. Mapping expects to iterate over keys and then look up the keys with __getitem__, but it receives item tuples instead, leading to KeyError.

Note also that the current "syntactic sugar" gives the nice feeling like iterating over a dict, but surprisingly that should give actually: list(iter(dict(key="value"))) == ["key"].

Possibly it is better to have dedicated named iterator methods instead of the current __iter__ implementation at the top-level of the object.

By being consistent with Python's abstract base classes, Pydantic would not only prevent errors, but could still offer the same convenience: dict(**model)

aeisenbarth avatar Feb 05 '24 14:02 aeisenbarth

Hi, so I have actually today shot myself in the foot (didn't hurt much) with this unexpected behavior of the BaseModel iter implementation.

I have a class like this

class FrequencyRange(BaseModel):
    frequency_low_hz: float = Field(default=0, gt=0)
    frequency_high_hz: float = Field(default=0, gt=0)

And it's useful for me that it's a Pydantic model because I want to have value restrictions, value checks etc. But sometimes I want it to behave as a simple tuple like (0, 100), to pass it to 3rd party apps that sometimes want tuples. But instead when I loop over it and print the values I get

('frequency_low_hz', 0.5)
('frequency_high_hz', 12.0)

So intead of 1 tuple I get 2! Of course I can overwrite the __iter__ method, but then I'm breaking dict(model) and it may become relevant somewhere. So instead I wrote a method as_tuple(), which is fine but defining __iter__ feels cleaner imho.

I too would favor @aeisenbarth solution of changing the default behavior from __iter__ to a separate specialized method.

toni-neurosc avatar Jun 04 '24 17:06 toni-neurosc

Pydantic models are rather comparable to key-value stores (mappings) than sequences. (And I would not rely on a stable order, although Python 3 dicts and Pydantic fields actually have). Iterating a Python mapping yields the keys, not the values (and not key-value tuples, as Pydantic incorrectly does).

So what you pass to other apps must be more tuple-like.

  • I think what you are looking for are maybe named tuples. If you define class FrequencyRange(NamedTuple), Pydantic models can have it as field type and serialize/deserialize it, except that then FrequencyRange alone doesn't have all the Pydantic features.
  • If you want a Pydantic model, you could implement a custom base model class SequenceLikeBaseModel where you override __iter__, __getitem__(int), __len__.
  • Or you pass frequency.model_dump().values() to other apps.

aeisenbarth avatar Jun 04 '24 18:06 aeisenbarth

Hi @aeisenbarth , thanks for the prompt response and your suggestions, I will keep those possibilities in mind going forward with Pydantic (still learning). For now my problem is solved simply having a way to return it as a tuple. I did indeed try Dataclasses and NamedTuples but for some specific functionality BaseModel was simply more convenient and not much of an issue to circumvent the unexpected __iter__ behavior.

I just thought, as I found this issue while googling about this, that I would present a case in which this issue could come up, and I liked your idea of freeing up the __iter__ method for Pydantic's BaseModel in general, not necessarily just for my use-case. I believe it's just better to have that method available if it can be done without breaking any feature, just makes Pydantic model's even more flexible imho.

Again, thanks a lot for taking the time to answer 😃

toni-neurosc avatar Jun 05 '24 11:06 toni-neurosc