requests
requests copied to clipboard
Endless history: the history contains a self-reference
The history of a requests contains a self-reference to the history owner. The history will be endless.
Expected Result
If I traverse recursive the complete history of a requests, this will be a finally graph. The history of a request is a tree without cycles.
R1 (history: 2)
R2 (no history)
R3 (no history)
Actual Result
If I traverse recursive the complete history of a requests, the program breaks with recursive error. RecursionError: maximum recursion depth exceeded while calling a Python object
The history contains a self-reference to the history owner.
The history of a request is a graph with a cycle.
R1 (history: 2)
R2 (no history)
R3 (history: 1)
R3 (history: 1)
R3 (history: 1)
R3 (history: 1)
R3 (history: 1)
....
id=140537834271072 history=2
index=0 id=140537834079136
index=1 id=140537834080960
id=140537834079136 history=0
id=140537834080960 history=1
index=0 id=140537834080960
id=140537834080960 history=1
index=0 id=140537834080960
id=140537834080960 history=1
index=0 id=140537834080960
id=140537834080960 history=1
index=0 id=140537834080960
....
id=140537834080960 history=1
index=0 id=140537834080960
id=140537834080960 history=1
index=0 id=140537834080960
Traceback (most recent call last):
File "/Users/andreas/PycharmProjects/cce/main.py", line 12, in <module>
history(requests.get('https://coord.info/GC8T8E8'))
File "/Users/andreas/PycharmProjects/cce/main.py", line 9, in history
history(item)
File "/Users/andreas/PycharmProjects/cce/main.py", line 9, in history
history(item)
File "/Users/andreas/PycharmProjects/cce/main.py", line 9, in history
history(item)
[Previous line repeated 993 more times]
File "/Users/andreas/PycharmProjects/cce/main.py", line 5, in history
print(f"id={id(r)} history={len(r.history)}")
RecursionError: maximum recursion depth exceeded while calling a Python object
Process finished with exit code 1
Reproduction Steps
import requests
def history(r):
print(f"id={id(r)} history={len(r.history)}")
for index, item in enumerate(r.history):
print(f"\tindex={index} id={id(item)}")
for index, item in enumerate(r.history):
history(item)
history(requests.get('https://coord.info/GC8T8E8'))
System Information
{
"chardet": {
"version": null
},
"charset_normalizer": {
"version": "2.1.1"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "3.4"
},
"implementation": {
"name": "CPython",
"version": "3.9.2"
},
"platform": {
"release": "21.6.0",
"system": "Darwin"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.28.1"
},
"system_ssl": {
"version": "1010109f"
},
"urllib3": {
"version": "1.26.13"
},
"using_charset_normalizer": true,
"using_pyopenssl": false
}
We're aware, see https://github.com/psf/requests/issues/2690 which is documented in the README.
I'm confused. I do not talk about the GIT history. I talked about the history of an request. Please, can you reopen my issue?
Sorry about that!
You don't need to traverse it recursively. Each response should have the entire history of the prior response.
In other words,
- You get
r3back which has a history that looks like[r2, r1]. r2has a history that looks like[r1], etc.
You only need to look at this history of the one you're seeing as the final response. It will contain the whole history.
So, this is a consequence of someone years ago arguing that each history response should have it's own history despite the linearity of the history and the fact that the top-level response has the entire context.
What's more helpful (in my view) for looking at this problem is:
>>> r = requests.get('https://httpbin.org/redirect/3')
>>> for resp in r.history:
... print(id(resp), end='->'); print([(id(_), len(_.history)) for _ in resp.history])
...
140058437748768->[]
140058437750976->[(140058437750976, 1)]
140058437751840->[(140058437750976, 1), (140058437751840, 2)]
Note that the 2nd response (which should have been the redirect request followed from the first item in the history) points to itself, not in fact to the first response.
While this is wrong, and can be fixed, it hasn't been noticed in years, because folks that care about the history are just looking at the top-level history (which contains everything) and don't actually need to look at each response's history.
I'd be in favor of just wiping those histories altogether instead of trying to make this look more correct. Either way, the place to fix this is in https://github.com/psf/requests/blob/15585909c3dd3014e4083961c8a404709450151c/requests/sessions.py#L182-L183
Namely, the first time through, we are called with the initial response that triggered a redirect, so we're doing
hist.append(resp)
resp.history = hist[1:]
This is correct because that 0th item in the final response's history has no other redirects.
We could also, write this two other ways:
resp.history = hist[:]
hist.append(resp)
This will create a new list based off of the existing history we know about, so that initial request will still be [], then the rest would look like:
... 140058437748768->[] 140058437750976->[(140058437748768, 0)] 140058437751840->[(140058437748768, 0), (140058437750976, 1)]
Alternatively, we could do
```py
hist.append(resp)
resp.history = hist[:-1]
Either way should make this work, but I think then, it has downstream affects on https://github.com/psf/requests/blob/15585909c3dd3014e4083961c8a404709450151c/requests/sessions.py#L190 which we can easily change to
if len(hist) >= self.max_redirects:
should I work on this?