python-consul
python-consul copied to clipboard
HTTPClient not thread safe
I want to use consul python module in a multiprocessing context : I need to execute x process and each of them is fetching node from consul with are advertising a special service, with special tag To give you an idea, the code is almost this
def run:
job = []
for item in list_a:
job = multiprocessing.Process(name='Process' + item, target=process,
args=(item))
job.daemon = False
jobs.append(job)
job.start()
for job in jobs:
job.join()
def process(item):
consul_api.health.service(MY_SERVICE, tag=item, dc=datacenter)
I was printing in base.py the result of the request and there was wrong results, I add this to service method of class Health
print "------------------%s %s %s" % (service,tag,self.agent.http.get(
CB.json(index=True),
'/v1/health/service/%s' % service,
params=params))
the was mixed results let say for tag1, it returns result for tag2
seems to be related to the object session retrieved from the requests related to this https://github.com/kennethreitz/requests/issues/1871 https://github.com/kennethreitz/requests/issues/2766
@jkhelil you are correct in assessing the root cause of this issue to be the request session implementation as this is what the standard consul agent's HttpClient
uses under the hood. There are a few options here, depending on your use case it might be easiest to instantiate a pool of consul agents and use it per process or use one agent per process. The latter would require you to concede the most performance hit and is probably wasteful. If suitable, you can also look at using the consul.aio.Consul
(python 3.4.2+), consul.tornado.Consul
or consul.twisted.Consul
implementations of the agent.
From the library's perspective, we can investigate alternatives or alternatives (feel free to propose a change or even better submit a PR); however I do feel think this is not a priority at the moment as this is a 10% case from what I can tell.
I think it's overkill for python-consul
to use requests
. The needed usage is so small, we don't really need the nicer api that requests
provides. I think the way to go is to drop requests
as a dependency, and just use Python's built in urllib
. This is just complicated because of the python 2 / 3 split, which hopefully six
will smooth out.
If someone sees this and has interest, a PR would be very appreciated!
Err...depends on if you would like to share sessions across requests and such. I could definitely see a use case for that. It also makes handling https easier IMO. That being said, if you want to go the urllib
route (or even httplib
) you don't really need six to accomplish that.
I think I am hitting this problem when this library is used as a backend for Celery job results, as reported here.
I see that what @cablehead suggested is a version of std.py that does its own session management rather than using the not-thread-safe session management in requests. Would a patch that uses urllib3 instead of requests be acceptable? That at least keeps connection pooling etc in play, rather than either a naive implementation using a standard library or a potentially buggy homebrew?
P.S. I note from Stackoverflow that urllib is not threadsafe either.