InvalidSchemaFormatValue exception when unmarshalling string with format 'byte'
When validating a property of type
type: string
format: byte
it is assumed that the base64 encoded value can be represented as utf-8 string. As a result InvalidSchemaFormatValue is raised when unmarshalling binary data which can't be represented as utf-8 string, e.g. the value base64.b64encode(b'\xff').
The exception text is:
Failed to format value /w== to format byte: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte.
IMO this was introduced with the fix for #117. I don't know the intention of the fix, why byte should be text type, but the way it's implemented fails with non unicode characters. I see two possible solutions, that are working for me:
Get back to the way byte was handled before #117, by just trying if the value can be base64 decoded:
def format_byte(value, encoding='utf8'):
return b64decode(value)
Or, ignore errors when converting the base64 decoded value to string:
def format_byte(value, encoding='utf8'):
return text_type(b64decode(value), encoding, "ignore")
I can provide a PR if someone gives me a signal which solution is preferred.
Meanwhile one can easily fix this with a monkeypatch 🙈
import openapi_core.unmarshalling.schemas.util
def new_format_byte(value, encoding='utf8'):
return b64decode(value)
openapi_core.unmarshalling.schemas.util.format_byte = new_format_byte
Workaround without monkey-patching, if using RequestValidator, pass custom_formatters:
import base64
from openapi_core.validation.request.validators import RequestValidator
from openapi_core.unmarshalling.schemas.formatters import Formatter
validator = RequestValidator(
spec, url,
custom_formatters={
'byte': Formatter.from_callables(lambda x: True, base64.b64decode),
})
validator.validate(request).raise_for_errors()
It's also possible to make a more intelligent check instead of lambda x: True, if you care about which exception you get from validation, but I think this still is enough to make sure that the value is valid.