aiohttp
aiohttp copied to clipboard
New app signals proposal to track connections and requests
Long story short
Right now there is not a set of signals in place to track connections and request for Aiohttp web server. The proposal would implement the following subset of signals, attached to the application.
on_connection_created(app, context)Triggered when a new connection has been created.on_connection_closed(app, context)Triggered when a connection has been closed.on_request_start(app, context, request)Triggered when a new request start.on_request_end(app, context, request, response)Triggered when a requests finishes.
All signals will receive as a parameter the app instance and a context that would help the user to pass information to the next semantic signal. For example, the context related with the connection will be instantiated when the on_connection_created is called and the value will be passed as a parameter for both connections methods, having the same for the request flavor.
These signals will be only triggered by the root app.
Taking into account that already exists a group of signals for the App and these differ a bit of the proposed ones, lack of context and so on. We could discuss on have a new module called web_tracing that will implement the same pattern as the client tracing [1], this might imply some name changing.
My 2cent about my cons with other ways to implement that:
- Middlewares are chained between the root app and nested apps, implement the request signals implicitly as a middleware IMHO is prone error and weak. Better having an ad-hoc set of signals that are never triggered by the nested apps.
- I would prefer to implement these signals out of the app scope, perhaps having them as a new parameter of the
run_appthat implements only these new signals. The problem with this solution is the incompatibility with environments that use the workers strategy to start the application such asgunicorn.
[1] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/tracing.py
Agree in general but let's polish client tracing first.
Also I think the best place for landing server signals is not Application but AppRunner.
Signals are decoupled from application logic most likely.
Stopped until #2686 gets done.
@asvetlov any idea how can we put the signals in the AppRunner making this feature available when gunicorn it's used?
I would like to throw in two ideas. I have an internal library that does request tracing and saves each request to the database. For our application it was essential that we grab all the outgoing traffic as often a request has financial consequences and we need to maintain full history.
The tracing feature seems very close to what we developed internally, however it missing:
-
Ability to grab
request_body. Is there any plan on adding a signal triggered after writing the body ? Perhaps I could help with that if I get the guidelines on desired naming of the signal. -
Ability to make
tracing_configtake in parameters from the callers context. Let me explain this with code:pivots = [{'name': 'balancing_group_id', 'value': group.id}] saver = get_request_saver(self.app, pivots) async with self.client.request_saver(saver): async with self.client.get(url) as resp: ...In the example above by passing the identifier we can later easily filter all the requests done in scope of the particular group.
Under the hoods this could be implemented as:
diff --git a/aiohttp/client.py b/aiohttp/client.py index 93eff65..b04abf0 100644 --- a/aiohttp/client.py +++ b/aiohttp/client.py @@ -139,10 +138,25 @@ class ClientSession: self._response_class = response_class self._ws_response_class = ws_response_class - self._trace_configs = trace_configs or [] - for trace_config in self._trace_configs: + self.task_locals = TaskLocals(loop=loop) + self._static_trace_configs = trace_configs + for trace_config in trace_configs: trace_config.freeze() + def _get_trace_configs(self): + return self.task_locals.get('trace_configs', []) + self._static_trace_configs + + def _set_trace_configs(self, tracing_configs): + self.task_locals.set('trace_configs', tracing_configs) + for trace_config in trace_configs: + trace_config.freeze() + + @async_contextmanager + async def trace_configs(self, trace_config): + self.set_trace_configs(tracing_configs) + yield + self.set_trace_configs([]) + def __init_subclass__(cls): warnings.warn("Inheritance class {} from ClientSession " "is discouraged".format(cls.__name__), @@ -256,7 +270,7 @@ class ClientSession: trace_config.trace_config_ctx( trace_request_ctx=trace_request_ctx) ) - for trace_config in self._trace_configs + for trace_config in self._get_trace_configs() ] for trace in traces:Implementation above would allow to both have the trace config global for the session as well as adding them dynamically with
async with session.trace_configs.For the sake of comlicity I should tell what
TaskLocalsis. It's the same concept ascontextvarswhich will be introduced in python 3.7 (https://docs.python.org/dev/library/contextvars.html#module-contextvars). The implementation I borrowed frompeewee_asynclibrary. It looks as follows:class TaskLocals: """Simple `dict` wrapper to get and set values on per `asyncio` task basis. The idea is similar to thread-local data, but actually *much* simpler. It's no more than a "sugar" class. Use `get()` and `set()` method like you would to for `dict` but values will be get and set in the context of currently running `asyncio` task. When task is done, all saved values is removed from stored data. """ def __init__(self, loop): self.loop = loop self.data = {} def get(self, key, *val): """Get value stored for current running task. Optionally you may provide the default value. Raises `KeyError` when can't get the value and no default one is provided. """ data = self.get_data() if data is not None: return data.get(key, *val) elif len(val): return val[0] else: raise KeyError(key) def set(self, key, val): """Set value stored for current running task. """ data = self.get_data(True) if data is not None: data[key] = val else: raise RuntimeError("No task is currently running") def get_data(self, create=False): """Get dict stored for current running task. Return `None` or an empty dict if no data was found depending on the `create` argument value. :param create: if argument is `True`, create empty dict for task, default: `False` """ task = asyncio.Task.current_task(loop=self.loop) if task: task_id = id(task) if create and not task_id in self.data: self.data[task_id] = {} task.add_done_callback(self.del_data) return self.data.get(task_id) def del_data(self, task): """Delete data for task from stored data dict. """ del self.data[id(task)]
Hi @kowalski thanks for your feedback, appreciate it!
Just a couple of comments to try to put all of us in the same page. This issue is about the tracing in the server side, to track those news connections and requests that flows into the server. For the Client side perspective, Aiohttp already has an implementation of the Client tracing since 3.0 [1].
So, first of all. This is about the server or the client side?
Regarding the Client side, current implementation allows passing static and dynamic configurations, maybe the solution is a bit overcomplicated but you should be able to make both things in your trace configs. The usage of one pattern or another was on the air at the moment of developing this feature, contextvars were discarded because Aiohttp gives support for Python versions > 3.5.
So the current implementation does allow to pass contextual information per request using the trace_request_ctx, but the developer has the freedom of using other mechanisms such as contextvars, TaskLocal, aiocontext, .... that will circumvent the current implementation and will allow pass information implicitly between the caller and the signals.
[1] https://docs.aiohttp.org/en/stable/client_advanced.html#client-tracing
Hi @pfreixes, sorry for confusion. I was not aware this issue is about the server-side tracing.
Indeed my comment was regarding the outgoing traffic, so the requests done with ClientSession.
Kindly point me what is the right way to make suggestions about extending that feature. Should I open another issue ?
I'm aware of tracing mechanism and I that I can pass the trace_request_ctx to get/post/put/delete methods. However I don't think it's very practical. Please consider a situation when a third-party library is used, which wraps the individual http calls into an application-logic methods. Take boto as an example. It doesn't provide access to individual http requests, it exposes services specific to AWS resources. To be able to use TraceConfig with variables passed from the context, boto would have to update it's interfaces to optionally take in trace_request_ctx on all the calls and pass it to underlying methods of ClientSession. It clearly will not happen.
On the other hand I can see a way to have context variables in the TraceConfig, but it's a hack. In my code I could have a TraceConfig subclass which holds the context in TaskLocals and exposes the async_contextmanager to set it.
For this reason my suggestion is to allow dynamic changes of ClientSession._trace_configs itself, without the need of changing calls to methods making requests.
please, @kowalski let's have a discussion in another issue that relates with client tracing and context integration.
Ok, I've splitted this to 2 new issues: https://github.com/aio-libs/aiohttp/issues/2754 (for context variables) https://github.com/aio-libs/aiohttp/issues/2755 (for getting request body)
Agree in general but let's polish client tracing first.
Any update on this issue? Is client implementation mature enough?