requests icon indicating copy to clipboard operation
requests copied to clipboard

Make sessions safe[r] in multi-process environment

Open rabbbit opened this issue 8 years ago • 8 comments

tldr; in multi-process environment (Celery) sessions might lead to request/responses being mixed up.

It is unsafe to use Session in a multi-process environment - if the fork happens after Session initialisation the underlying connection pool will be shared across both processes, leading to potentially dangerous and hard to debug issues.

I'm not sure what should happen - whather a code change is necessary, or a documentation change is enough. Please advise :)

Use case

It is likely to happen if Session is created at module load time, like:

class MyClient(object):
    session = requests.Session()

    def do_things(self, **params):
         self.session.get(**params)

or if a 3rd party client is imported at the module level, where it becomes totally invisible:


my_database_client = database.DatabaseClient(**params)

class MyDatabaseWrapper(object):

      def do_things(self, **params):
            my_database_client.update(**params)

This is particularly tricky in a Celery, where a previously working function might start causing troubles if invoked from Celery. Celery seems like a common Python use case.

I've seen this pattern in 3 different repos written by 3 different developers - it feels common enough for it to be a problem.

Reason

This is related to https://github.com/shazow/urllib3/issues/850 in urllib3, where it was stated that it's the callers' responsibility to worry about forking - in this case, it's Requests.

Expected Result

Ideally, a new Session with the same parameters would be started by Requests.

If that's too complicated, I'd expect an exception to be thrown if PID change was detected.

At the very least, docs should state the expected behaviour.

Actual Result

The responses are mixed up - one process might receive a response made for a call it didn't make.

Reproduction Steps

import os
import sys
import requests

MAX = 20
s = requests.Session()

for n in range(MAX):
    pid = os.fork()
    if pid == 0:
        try:
            # s.mount("http://", requests.adapters.HTTPAdapter())  # uncomment to fix
            r = s.get('http://httpstat.us/300?sleep=100')
        except Exception:
            # ignoring intermittent http errors
            pass
        sys.exit(0)
    else:
        try:
            r = s.get('http://httpstat.us/200')
        except Exception as exc:
            # ignoring intermittent http errors
            pass

        if r.status_code != 200:
            print 'pid: {} Call {}/{}. Wrong status code detected {}'.format(
                os.getpid(),
                n,
                MAX,
                r.status_code
            )
pid: 30996 Call 7/20. Wrong status code detected 300   
pid: 30996 Call 9/20. Wrong status code detected 300   
pid: 30996 Call 14/20. Wrong status code detected 300  
pid: 30996 Call 17/20. Wrong status code detected 300  
pid: 30996 Call 19/20. Wrong status code detected 300 

System Information

$ python -m requests.help
pawel@pawel-C02V306VHTDH ~/Uber/tokenizer-python $ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  }, 
  "cryptography": {
    "version": ""
  }, 
  "idna": {
    "version": "2.6"
  }, 
  "implementation": {
    "name": "CPython", 
    "version": "2.7.14"
  }, 
  "platform": {
    "release": "16.7.0", 
    "system": "Darwin"
  }, 
  "pyOpenSSL": {
    "openssl_version": "", 
    "version": null
  }, 
  "requests": {
    "version": "2.18.4"
  }, 
  "system_ssl": {
    "version": "100020cf"
  }, 
  "urllib3": {
    "version": "1.22"
  }, 
  "using_pyopenssl": false
}```

This command is only available on Requests v2.16.4 and greater. Otherwise,
please provide some basic information about your system (Python version,
operating system, &c).

rabbbit avatar Oct 02 '17 19:10 rabbbit

So my suggestion here is that we simply document this as a risk. It can’t happen merely by creating a Session before forking, it has to actually be used before the fork. The difficulty is that forking combined with multi threading and streaming means that we need to check PIDs constantly: additionally, there is no appropriate recovery in many cases.

As a result I suggest this is best handled at application scope: either create sessions after forking or wrap them in a data structure that understands your forking paradigm. There is just no general solution to this problem.

Lukasa avatar Oct 04 '17 20:10 Lukasa

I second to this suggestion. I was using same sessions object across multiple independent processes, and boy, I had to literally spend 3 hours scratching my head before I could finally pinpoint to this as a root cause.

Oddly though, my code was working perfectly on a windows platform. On Linux, I was having a issue with responses mixups. I think may be because windows and Linux handle multiprocessing differently? Not sure here.

thepegasos avatar Mar 30 '18 01:03 thepegasos

I think may be because windows and Linux handle multiprocessing differently? Not sure here.

They do. multiprocessing on Windows can't fork in the same was as linux so you were not actually sharing the same session across processes

sigmavirus24 avatar Mar 30 '18 14:03 sigmavirus24

I am also having issues with celery and requests, this results in unexplained hangups while .get'ing , bad content, and auth tokens being randomly invalidated.

sivang avatar Jul 06 '18 04:07 sivang

Quoting:

As a result I suggest this is best handled at application scope: either create sessions after forking or wrap them in a data structure that understands your forking paradigm. There is just no general solution to this problem.

sigmavirus24 avatar Jul 06 '18 12:07 sigmavirus24

Would using two session objects, one for authentication pre-fork, extracting the cookiejar , passing the cookiejar to the to-be-forked task (-celery) and then using it to do whatever needs to be done against the server work?

sivang avatar Jul 10 '18 12:07 sivang

So I think I've got it, and have a code recipe for a best practice, should I go ahead and write something and then submit a PR for the docs?

sivang avatar Jul 22 '18 08:07 sivang

@sivang Can you please share your code recipe for best practice?

JackieMa000 avatar Nov 15 '18 09:11 JackieMa000