requests icon indicating copy to clipboard operation
requests copied to clipboard

Debugging and tracing my code with calling to requests may result different from running it.

Open swmcl opened this issue 2 years ago • 6 comments

Summary.

Expected Result

Debugging and tracing my code should result same as running it.

Actual Result

If I set breakpoint in sonewhere related with underlying urllib3.response.Response class and view the data property in the watch list, and then resume and run the program to the end, you may get nothing in requests.response.content which is different from running it without debugging.

Reproduction Steps

import requests
resp=requests.get('https://www.google.com')
print(resp.text)
  1. running the code above will print the html of the google.com.
  2. set breakpoint at the last line of the requests.adapters.py, which is: return self.build_response(request, resp)
  3. debug the code above and the program will be paused at the above return statement.
  4. in the watch list, expand the resp and you will see something in the data property.
  5. resume the program, then you will see nothing being printed.

System Information

$ python -m requests.help
{                        
  "chardet": {           
    "version": "4.0.0"   
  },                     
  "charset_normalizer": {
    "version": "2.0.7"   
  },                     
  "cryptography": {      
    "version": "35.0.0"  
  },                     
  "idna": {              
    "version": "3.3"     
  },                     
  "implementation": {    
    "name": "CPython",   
    "version": "3.10.4"  
  },                     
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "101010cf",
    "version": "22.0.0"
  },
  "requests": {
    "version": "2.27.1"
  },
  "system_ssl": {
    "version": "101010ef"
  },
  "urllib3": {
    "version": "1.26.9"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": true
}

This command is only available on Requests v2.16.4 and greater. Otherwise, please provide some basic information about your system (Python version, operating system, &c).

swmcl avatar Apr 23 '23 08:04 swmcl

The reason for the difference between debugging and running is:

  1. when the response.content is accessed, it calls iter_content which then calls raw(urllib3.response.HTTPResponse).stream() with decode_content=True
  2. when we expand the resp and see the data property, the property function data() calls read(), rather than the stream(), without decode_content, which will use self.decode_content.
  3. the self.decode_content at this moment is False because requests.adapters.HTTPAdapter.send create the resp instance with decode_content=False.
  4. running the code will decode the content without caching the data in urllib3.response.HTTPResponse._body, while debugging the code with watching the data property of the urllib3.response.HTTPResponse won't decode it but cache the raw data in _body. after that, any accessing of requests.response.content won't get anything from http socket through stream() because it's already being read and cached in _body and the stream() doesn't even look at the _body.

swmcl avatar Apr 23 '23 09:04 swmcl

A person goes to an appointment with their doctor. At the end the doctor asks if the patient has any other concerns. The person raises one concern, "When I take my sledgehammer and hit my foot with it as hard as I can, the bones break, it hurts and I can't walk on it until I get the cast off and finish physical therapy. Even then there's some residual stiffness and pain. How do I avoid that?"


This isn't a bug in either requests or urllib3. Neither is designed for you to do what you're doing exactly and have no consequences. HTTP and TCP are stateful protocols. Changing the state of things without understanding the consequences will help you learn those consequences.

In fact, the very thing you're encountering is actually a feature. People often need to handle very large files on constrained memory. This allows them to not cache it all in memory and download it efficiently. Neither library will change this pattern. Requests does not need multiple paths just in case either as that penalizes new users

sigmavirus24 avatar Apr 23 '23 11:04 sigmavirus24

Thanks for your reply @sigmavirus24!

Now I think the result of the difference between debugging and running a program was caused by maybe an issue of the PyCharm. I don't what other IDEs will do if I expanse an object. It will be better if PyCharm doesn't show a property right away, instead with a button, which will show the content, and a hint of warning message so that the programmer can choose to show the content of a property by bearing the risk of getting a different result of the program.

But,

  1. why does requests always read content as stream even if I set the stream=False? when I have enough memory or the content is not too large, why doesn't it just call the read() and point to, or a copy of the data property, hence the _body hidden property, in the urllib3.response.HTTPResponse?
  2. what if I have large file to download and set the stream to True and debugging my program in the requests.Response? The content will be filled with the large file in memory, which is not the expected behavior, although it's because of the reason of PyCharm I mentioned above. Why doesn't the content property function raise an exception, or a warning message, under stream=True and advise the programmer to manually call iter_content() instead.

swmcl avatar Apr 24 '23 00:04 swmcl

why does requests always read content as stream even if I set the stream=False?

One code path is easier to maintain and results in fewer bugs.

why doesn't it just call the read() and point to, or a copy of the data property, hence the _body hidden property, in the urllib3.response.HTTPResponse?

We don't access non public properties nor will we

what if I have large file to download and set the stream to True and debugging my program in the requests.Response?

We can only do so much to prevent you from taking the sledgehammer to your foot. We do what we can to protect as many people as possible by default without compromising ergonomics. If someone isn't careful they had the ability to be careful and the ducks will help them when they realize they're mistake

sigmavirus24 avatar Apr 24 '23 02:04 sigmavirus24

Thanks for the answers!

We don't access non public properties nor will we

Although the _body of the urllib3.response.HTTPResponse is a non-public field, but the data property DO be public. Assigning urllib3.response.HTTPResponse.data property to requests.Response._content when stream is False is easy and won't cause any problem, I think.

swmcl avatar Apr 24 '23 02:04 swmcl

Thanks for the answers!

We don't access non public properties nor will we

Although the _body of the urllib3.response.HTTPResponse is a non-public field, but the data property DO be public. Assigning urllib3.response.HTTPResponse.data property to requests.Response._content when stream is False is easy and won't cause any problem, I think.

It does. It will cause unexpected memory usage. You're now likely copying that data and doubling the usage in cases that involve multiprocessing and threads. It's not an acceptable risk

sigmavirus24 avatar Apr 24 '23 12:04 sigmavirus24