requests
requests copied to clipboard
Breaking change in 2.28.0 when using string enums as Headers (working in v2.27.1)
When using string enums as key values for the headers
dict parameter with requests.get()
function in v2.28.0
it raises an InvalidHeader
error but using previous version v2.27.1
works well without any error. It seems there was an unexpected breaking change with that release.
Expected Result
Dict keys for the headers parameter when using requests.get()
are still valid if a string enum is used.
Actual Result
Instead it raises an InvalidHeader
error stating that:
Header must be of type
str
orbytes
, notenum
.
Traceback (most recent call last):
File "/home/yair/PycharmProjects/tests/requests_error.py", line 9, in <module>
requests.get("http://URL", headers={CustomEnum.TRACE_ID: "90e85293-afd1-4b48-adf0-fa6daf02359e"})
File "/home/yair/.local/lib/python3.8/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/home/yair/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/yair/.local/lib/python3.8/site-packages/requests/sessions.py", line 573, in request
prep = self.prepare_request(req)
File "/home/yair/.local/lib/python3.8/site-packages/requests/sessions.py", line 484, in prepare_request
p.prepare(
File "/home/yair/.local/lib/python3.8/site-packages/requests/models.py", line 369, in prepare
self.prepare_headers(headers)
File "/home/yair/.local/lib/python3.8/site-packages/requests/models.py", line 491, in prepare_headers
check_header_validity(header)
File "/home/yair/.local/lib/python3.8/site-packages/requests/utils.py", line 1037, in check_header_validity
raise InvalidHeader(
requests.exceptions.InvalidHeader: Header part (<CustomEnum.TRACE_ID: 'X-B3-TraceId'>) from {<CustomEnum.TRACE_ID: 'X-B3-TraceId'>: '90e85293-afd1-4b48-adf0-fa6daf02359e'} must be of type str or bytes, not <enum 'CustomEnum'>
Reproduction Steps
from enum import Enum
import requests
class CustomEnum(str, Enum):
TRACE_ID = "X-B3-TraceId"
requests.get("http://URL", headers={CustomEnum.TRACE_ID: "90e85293-afd1-4b48-adf0-fa6daf02359e"})
System Information
$ python -m requests.help
{
"chardet": {
"version": "3.0.4"
},
"charset_normalizer": {
"version": "2.0.4"
},
"cryptography": {
"version": "2.8"
},
"idna": {
"version": "3.2"
},
"implementation": {
"name": "CPython",
"version": "3.8.10"
},
"platform": {
"release": "5.13.0-48-generic",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "1010106f",
"version": "19.0.0"
},
"requests": {
"version": "2.28.0"
},
"system_ssl": {
"version": "1010106f"
},
"urllib3": {
"version": "1.26.6"
},
"using_charset_normalizer": false,
"using_pyopenssl": true
}
I know it is easily solvable by making use of the CustomEnum.TRACE_ID.value
to use the stored string instead of using the enum directly with CustomEnum.TRACE_ID
... but it worked before with v2.27.1, so it feels like an unexpected breaking change... :eyes:
Thanks for the report, @bonastreyair! So I think this is on the edge of support. Headers have always been defined to be a str
or bytes
and we've been programmatically enforcing that for header values for 6 years. While this worked in 2.27.1, it wasn't an intentional function of the headers param. I'd like to see how widespread this issue is before we make a decision on cases like this.
In the meantime, Requests has no special logic for Enums, so casting it to a string prior to calling Requests will not affect behavior. That would be the recommendation while we gauge impact. Otherwise, 2.27.1 will continue to work as it did originally.
Here's another instance fwiw: https://github.com/taverntesting/tavern/issues/788 In this case it's just a subclass of str being passed in as header values: https://github.com/taverntesting/tavern/blob/ede36ca062b546cfaf3e6b0940fbac30733586a4/tavern/util/formatted_str.py
We also have similar issues in company internal tools. Isn't the explicit check for str and byte contradicting the duck typing approach of python? At least previously the code was working fine, even though we passed e.g. xml elements that were then implicitly converted to strings.
We also have similar issues in company internal tools. Isn't the explicit check for str and byte contradicting the duck typing approach of python? At least previously the code was working fine, even though we passed e.g. xml elements that were then implicitly converted to strings.
No. It's not contradicting that because any time we magically convert things for users, it tends to be wrong about 50% of the time. Being explicit in our documentation hasn't been effective and these validity checks are designed to help users not send junk that will almost certainly not do what they want. Much of what has been explicitly described above either for narrow cases but not for everyone. Enums for example are not required to have a string values. How does one consistently string-ify it then? Support for people subclassing built-in types notorious for breaking those subclasses is also not a logical choice because again it will surprise people who try it by breaking in a weird way it doing the opposite of what they expect
We encountered this same issue when passing header values whose type is a subclass of str
. Might it make sense to allow subclasses to pass validation? In this case the requests library would not be doing any conversion for the user.
I wrote up a patch for this on Friday. I've been waiting to see how widely this approach is used, because we've only had 4 reports with ~75% of downloads (50 million/week) now using 2.28.0. I think this falls into a very niche usage, but I agree bytes
/str
subclasses did work for both name and value prior to this release.
We're going to monitor for any other issues in 2.28.0 to determine if we need a patch release. I'd be willing to merge the fix above into that with a heavy caveat that this isn't strictly supported. Any breakages due to subclasses deviating in behavior from the base bytes
/str
type likely won't be fixed.
Long term, I think we're going to continue taking a much firmer stance on inputs because people do crazy things when the values aren't bytes
/str
and expect them to just work. This becomes much more problematic when the requests are being sent to systems corrupted due to incorrect "stringification". We can't reasonably handle every case, so we need to draw a line somewhere on what the APIs are intended to support to provide stability.
In our case we're interacting with Salesforce, whose API consists of XML messages. We parse a session ID out of the response from the authentication endpoint using lxml
. This value is of type lxml.etree._ElementUnicodeResult
which is a subclass of str
. We pass it to the constructor of a class from salesforce-bulk
. This class uses requests
under the covers to communicate with Salesforce. It naively assumes that the session ID we supplied can be passed to requests
as a header value without issue.
Our fix was to cast the session ID to an actual str
before giving it to salesforce-bulk
.
Just want to add support for this issue by saying we are also having this issue. Our use is basically:
class ContentType(str, Enum):
JSON = "application/json"
XML = "text/xml"
...
def execute_request(...):
...
headers = {"Content-Type": ContentType.JSON}
response = requests.request(
url=url,
method=method,
params=params,
headers=headers,
data=data
)
Using enums for header names is a common (and generally regarded as a good) practice. +1.
In the ContentType
enum given above, you have what looks like a string, acts like a string, and quacks like a string. It should be treated as a string. requests
shouldn't be responsible for being fed misbehaving subclasses of builtin types IMO.
Anyway, just trying to add one to the 'user engagement' since that was mentioned as a consideration in the 2.28.1 release.
Using enums for header names is a common (and generally regarded as a good) practice. +1.
I don't believe anything in this change precludes the use of enums in your code. Requests is being explicit of what we accept. enum.name
becomes enum.name.value
and everything works fine.
Also seeing this in an Ansible collection, since the test containers updated their requirements to allow a newer version of requests
.
- https://github.com/ansible-collections/community.hashi_vault/issues/289
As explained in #6232 -> we use a string subclass to prevent secrets leaking in certain circumstances, i.e. in Stack Traces.
Just converting them to str
would contradict this intention. So in opposite to using Enums, there is not really a workaround for us.
Just converting them to str would contradict this intention
To be clear, to convert it to a str
before passing to requests would be a problem, but requests converting it to a string isn't a problem in your eyes? That makes absolutely zero sense to me. Requests isn't doing anything special to handle your secret.
Also, you seem to be relying on this class to avoid leaking secrets in stack traces and assuming you/your colleagues will always remember to use it rather than looking for secrets in stack traces before allowing them to be printed to logs by using a formatter to ensure no secrets are printed ever (just to be safe). Defense in depth is always better than expecting a library to do something for you at one or two layers below you.
Actually the Secret class is of course one of multiple layers. We ensure on multiple places (i.e. using our auth libs), that secrets given are wrapped in this special class. We simply add this test to all places in our code that accepts secrets. This helps us also to track secrets within our code base. We not solely rely on it.
And of course we use loggers/formatters that prevent leaking secrets but enforcing that is much harder then wrapping secrets. Better do both.
Also other libraries follow a similar approach. For instance httpx for URLs: https://github.com/encode/httpx/blob/master/httpx/_urls.py#L512
And actually requests does not (as far I found) convert the special class back to a normal string.
def test_header_validation(self, httpbin):
"""Ensure prepare_headers regex isn't flagging valid header contents."""
class StringSubClass(str):
def __repr__(self):
return "Secret"
valid_headers = {
"foo": "bar baz qux",
"bar": b"fbbq",
"baz": "",
"qux": "1",
"sub": StringSubClass("VerySecret")
}
r = requests.get(httpbin("get"), headers=valid_headers)
for key in valid_headers.keys():
valid_headers[key] == r.request.headers[key]
# Ensure type is not modified
assert type(valid_headers[key]) == type(r.request.headers[key])
This modified test works as expected.
Even the function to_native_string
called in prepare_header (and even used for keys only) in _intnal_utils.py
does not convert a value back to a naive string. Of course some send logic could/will do convert the value i.e. to a byte. But to leak the secret a exception needs to be raised in function that is called with this converted value without being handled by the library...
So requests does not have to handle the secret any special. It would only leak the secret in a stack trace, if the header is converted to a normal str class before passing it to a function that is raising the exception which is printed. Yes of course that is not 100% protection against a possible leakage but good enough for us as very unlikely.
Consider the opposite: We tell our devs to use the Secret
everywhere, but then also need to tell them to wrap calls to requests to make them visible.
It like: Hey on all terminals enter your password in the input showing ****
for your input but in this special terminal just enter it in plain text.
Yes maybe also the terminal showing *****
is saving the password in the background as plain text but that is a different story. The main point here how to teach people on how to handle a secret.
To make the case more general again:
Subclasses of strings are common for a number of reason, i.e. a json/xml/yaml libary that uses it to track formatting of a string, a a string enum, mark stuff a secret etc... Enforcing stuff to be naive string class will break this things. If it is a subclass of a string, it should be good enough.
I do not see a case in which both the isinstance
check is okay and the regex matches a valid input and but the string representation is still wrong.
In this case also the output of print(value)
will show the wrong value.
I know that users of libs do crazy stuff but could you elaborate what kind of wrong "stringification" could happen if both checks are valid?
@nateprewitt any news here? I really would love to see your patch applied.
Quick ping here, looking over the proposed patch I don't see a good reason not to apply it, especially given that this kind of behavior actually violates the Python principle of 'duck typing'. It's unclear why Requests needs to be deeply coupled beyond isinstance
. What's the use-case where limiting to the precise coordinate of str
or bytes
is truly necessary? And again, we aren't talking about string conversion, but simply if it says it's an instance of str
or bytes
, letting users be responsible for what happens if it isn't true seems reasonable, for the same reason single-underscore private methods aren't truly private in python, but a convention.
you can resolve this issue by converting to a string before parsing it as a key to the headers directory ` from enum import Enum class CustomEnum(Enum): TRACE_ID = "X-B3-TraceId"
headers = {CustomEnum.TRACE_ID: "90e85293-afd1-4b48-adf0-fa6daf02359e"} requests.get("http://url/", headers=headers)