pydantic
pydantic copied to clipboard
`pydantic.Json` should enforce that dict keys may only be of type `str`
In a JSON object, only strings are valid keys. This could be enforced through the pydantic.Json
type, which currently has some weird casting or error behaviours for various invalid key types.
For integers, which are valid Python but not JSON keys, it would be nice to make this a definition-time error but the runtime behaviour is really weird:
from typing import Dict, List
from pydantic import BaseModel, Json
class IntKeys(BaseModel):
x: Json[Dict[int, int]] # This type is not valid JSON
>>> IntKeys(x='{1: [2]}') # Note: this string is invalid JSON
ValidationError: 1 validation error for IntKeys
x
Invalid JSON (type=value_error.json)
>>> IntKeys(x='{"1": [2]}') # Note: valid JSON, but wrong key type - pydantic casts it to int
IntKeys(x={1: [2]})
Lists are not valid Python keys, and Pydantic doesn't seem to try casting them either. This makes much more sense but a definition-time error would still be nice.
class ListKeys(BaseModel):
x: Json[Dict[List[int], int]] # And this isn't even valid Python
>>> ListKeys(x='{[]: 2}')
ValidationError: 1 validation error for ListKeys
x
Invalid JSON (type=value_error.json)
>>> ListKeys(x='{"[]": 2}')
ValidationError: 1 validation error for ListKeys
x -> __key__
value is not a valid list (type=type_error.list)
Like #2095, I found these while writing tests for #2017.
I'm not sure x: Json[Dict[int, int]]
should be invalid actually.
You may agree or disagree with pydantic's enthusiasm for coercion over validation (e.g. if you have an int
field, "1"
will automatically be coerced to an int), but that's how pydantic is.
With that convention, I think most people would expected x: Json[Dict[int, int]]
to work too, e.g JSON '{"123": "321"}'
should be coerced to {123: 321}
.
There are numerous discussion about this (filter issues by the "strictness" label) and I'm very open to changing the behaviour on future. But while pydantic is the way it is, I don't think we should change this. Even if we change it for some types, I suspect most people would want strings to be coerced to ints - think about text-only situations like environment variables and URL parameters.
With Dict[List[int], int]
I agree this is invalid and a class-creation-time check would be good, but that sounds like a separate feature and quite complex to implement.
I agree with @samuelcolvin that this should not raise an error at definition time.
It does, however leave open the issue of "surprising" errors while exporting to json. I just tripped up on this with regards to UUID.
I'd like to request that something more is done to facilitate exporting dictionaries to json such that they can be read by Pydantic. Pydantic already has mechanisms in place to coerce exotic data types to/from strings. It's most surprising to discover that this doesn't happen with dictionary
keys.
Example is:
from pydantic import BaseModel
from uuid import UUID
from typing import Dict, List
from datetime import datetime
class Wigit(BaseModel):
id: UUID
name: str
class WigitRecord(BaseModel):
timestamp: datetime
values: Dict[str, float]
class WigitRecordSet(BaseModel):
origin_timestamp: datetime
records: Dict[UUID, List[WigitRecord]]
It's very surprising to discover that you can Wigit.parse_raw(item.json())
and WigitRecord.parse_raw(item.json())
when the same components are put together a little differently in WigitRecordSet.parse_raw(item.json())
, it trips up on an error:
TypeError: keys must be str, int, float, bool or None, not UUID
Are there already any workarounds for getting .json()
to work with non-string keys?
Is it possible to fix this in Pydantic so that it just works?
I would also be very interested in a solution/workaround.
I have this same error when calling .json()
on a model which has a field of type dict[UUID, Any]
.
I acknowledge the problem, we'll need to find a fix in V2.
I believe this is fixed in v2:
from typing import Any
from uuid import UUID
from pydantic import BaseModel
class Model(BaseModel):
x: dict[UUID, Any]
m = Model(x={'00000000-0000-0000-0000-000000000000': 1})
print(m)
#> x={UUID('00000000-0000-0000-0000-000000000000'): 1}
assert m.model_dump_json() == '{"x":{"00000000-0000-0000-0000-000000000000":1}}'
If there are any similar issues, please report them — I think we now have the infrastructure necessary to resolve serialization behavior specifically for JSON in a more reliable way, so should be able to address similar issues more easily.