requests icon indicating copy to clipboard operation
requests copied to clipboard

Make sessions safe[r] in multi-process environment

Open rabbbit opened this issue 6 years ago • 8 comments

tldr; in multi-process environment (Celery) sessions might lead to request/responses being mixed up.

It is unsafe to use Session in a multi-process environment - if the fork happens after Session initialisation the underlying connection pool will be shared across both processes, leading to potentially dangerous and hard to debug issues.

I'm not sure what should happen - whather a code change is necessary, or a documentation change is enough. Please advise :)

Use case

It is likely to happen if Session is created at module load time, like:

class MyClient(object):
    session = requests.Session()

    def do_things(self, **params):
         self.session.get(**params)

or if a 3rd party client is imported at the module level, where it becomes totally invisible:


my_database_client = database.DatabaseClient(**params)

class MyDatabaseWrapper(object):

      def do_things(self, **params):
            my_database_client.update(**params)

This is particularly tricky in a Celery, where a previously working function might start causing troubles if invoked from Celery. Celery seems like a common Python use case.

I've seen this pattern in 3 different repos written by 3 different developers - it feels common enough for it to be a problem.

Reason

This is related to https://github.com/shazow/urllib3/issues/850 in urllib3, where it was stated that it's the callers' responsibility to worry about forking - in this case, it's Requests.

Expected Result

Ideally, a new Session with the same parameters would be started by Requests.

If that's too complicated, I'd expect an exception to be thrown if PID change was detected.

At the very least, docs should state the expected behaviour.

Actual Result

The responses are mixed up - one process might receive a response made for a call it didn't make.

Reproduction Steps

import os
import sys
import requests

MAX = 20
s = requests.Session()

for n in range(MAX):
    pid = os.fork()
    if pid == 0:
        try:
            # s.mount("http://", requests.adapters.HTTPAdapter())  # uncomment to fix
            r = s.get('http://httpstat.us/300?sleep=100')
        except Exception:
            # ignoring intermittent http errors
            pass
        sys.exit(0)
    else:
        try:
            r = s.get('http://httpstat.us/200')
        except Exception as exc:
            # ignoring intermittent http errors
            pass

        if r.status_code != 200:
            print 'pid: {} Call {}/{}. Wrong status code detected {}'.format(
                os.getpid(),
                n,
                MAX,
                r.status_code
            )
pid: 30996 Call 7/20. Wrong status code detected 300   
pid: 30996 Call 9/20. Wrong status code detected 300   
pid: 30996 Call 14/20. Wrong status code detected 300  
pid: 30996 Call 17/20. Wrong status code detected 300  
pid: 30996 Call 19/20. Wrong status code detected 300 

System Information

$ python -m requests.help
pawel@pawel-C02V306VHTDH ~/Uber/tokenizer-python $ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  }, 
  "cryptography": {
    "version": ""
  }, 
  "idna": {
    "version": "2.6"
  }, 
  "implementation": {
    "name": "CPython", 
    "version": "2.7.14"
  }, 
  "platform": {
    "release": "16.7.0", 
    "system": "Darwin"
  }, 
  "pyOpenSSL": {
    "openssl_version": "", 
    "version": null
  }, 
  "requests": {
    "version": "2.18.4"
  }, 
  "system_ssl": {
    "version": "100020cf"
  }, 
  "urllib3": {
    "version": "1.22"
  }, 
  "using_pyopenssl": false
}```

This command is only available on Requests v2.16.4 and greater. Otherwise,
please provide some basic information about your system (Python version,
operating system, &c).

rabbbit avatar Oct 02 '17 19:10 rabbbit