dash icon indicating copy to clipboard operation
dash copied to clipboard

[Feature Request] PreProcess and PostProcess Callback Hooks

Open BSd3v opened this issue 3 months ago • 4 comments

I'd like to request PreProcess and PostProcess hooks to be used within the scope of a callback.

Essentially, this would allow developers to intercept specific things before and after a callback function is called. The PostProcess would be placed before the response was sent back to the client.

This could potentially solve issues like this: https://github.com/plotly/dash/issues/262


Here is an example of how this could potentially work, though the hooks wouldnt be inside the callbacks:

import dash
from dash import html, Input, Output, dcc, ctx
import time

server_side_stores = {}

def my_preprocess(args, kwargs, input_state_list):
    # Log and update args with server-side store data if available
    for i in range(len(args)):
        if 'id' in input_state_list[i]:
            store_id = input_state_list[i]['id']
            if store_id in server_side_stores:
                args[i] = server_side_stores[store_id]
    return args, kwargs

def my_postprocess(output_list, response):
    # Log and update server-side store with response data
    for i in range(len(output_list)):
        if 'id' in output_list[i]:
            store_id = output_list[i]['id']
            server_side_stores[store_id] = response[i]
            response[i] = {"timestamp": time.time()}
    return response

class MyStore(dcc.Store):
    def __init__(self, id, data, *args, **kwargs):
        server_side_stores[id] = data
        timestamped_data = {"timestamp": time.time()}
        super().__init__(id=id, data=timestamped_data, *args, **kwargs)

app = dash.Dash(__name__, suppress_callback_exceptions=True)

app.layout = html.Div(
    [
        MyStore(id="my-store", data={"key": "value"}),
        dcc.Input(id="input", type="text"),
        html.Div(id="output"),
    ],
    style={"padding": 50},
)

@app.callback(
    Output("output", "children"),
    Input("my-store", "data"),
)
def update_input(store_data):
    # Preprocess
    args, kwargs = my_preprocess(
        [store_data], {}, ctx.inputs_list+ctx.states_list
    )
    result = f"Original data: {args[0]}, Store timestamp: {store_data['timestamp']}"
    return result

@app.callback(
    Output("my-store", "data"),
    Input("input", "value"),
    prevent_initial_call=True,
)
def update_store(input_value):
    # Postprocess
    resp = my_postprocess([ctx.outputs_list], [{"key": input_value}])
    return resp[0]

if __name__ == "__main__":
    app.run(debug=True)

I recommend adding the hooks to the _callback.py in _initialize_context after the func_args, func_kwargs are set and in _prepare_response. With the currently displayed pattern, there could be some potential performance drawbacks, so we might need to look at streamlining how these are iterated.

This could also be narrowed down to only the callbacks that need to be transformed/augmented in place.


This could also be used to allow dataframes to be returned from a callback, data validation on the frontend, etc.

BSd3v avatar Oct 08 '25 16:10 BSd3v

Reference: https://docs.djangoproject.com/en/5.2/topics/async/

I propose @never_cache:

Ω black hole

@app.callback(
    Output("output", "children"),
    Input("my-store", "data"),
    never_cache=True
    prevent_initial_call=True
)
def update_input(store_data):
    # Preprocess
    args, kwargs = my_preprocess(
        [store_data], {}, ctx.inputs_list+ctx.states_list
    )
    result = f"Original data: {args[0]}, Store timestamp: {store_data['timestamp']}"
    return result

Args and Kwargs was a good call, we should also keep in mind the dash pages library and the Django framework, whats the applications heartbeat rhythm, how does it work with dcc.Store to not inter fear with other dash callbacks while being able to run background tasks and scheduled events. And many Flask @app.server.post's

Image We should consider the dash pages library and the Django framework as well as scheduler and setup an application pulse for batching callbacks prior to the PreProcess and PostProcess. Async Tasks from Requests to Output Image

pip-install-python avatar Oct 09 '25 06:10 pip-install-python

@pip-install-python

There should be no effect on the pages and also this is an opt in on the part of the dev needs to take these things into account with their complex workflow.

Plotly would only provide the hooks and they'd get used how the developer wants to.

If you are talking about dynamic layouts from pages, it'd be be slightly different, but if you extend the class out the way that I showed in the example, then those things would get loaded automatically.

BSd3v avatar Oct 09 '25 14:10 BSd3v

@BSd3v What about dcc.Interval interacting within callbacks that have the PreProcess and PostProcess? How does PreProcess and PostProcess hooks work with dcc.Interval callbacks?

The dcc.Interval coordination challenge:

The dcc.Interval component creates a fundamental problem in multi-user Dash applications. Each component runs independently on every client's browser, incrementing its n_intervals property at the specified interval (default 1000ms). When multiple users access the application simultaneously, each interval fires separate callback requests to the server.

Current behavior without hooks: Consider three users with the same dashboard open, each with dcc.Interval(id='refresh', interval=5000) triggering a database query callback. Every 5 seconds, the server receives three simultaneous requests, each executing the expensive query independently. With 50 concurrent users, the database faces 50 simultaneous queries every 5 seconds—a recipe for overload. PreProcess hooks enable server-side coordination With PreProcess hooks, Dash can implement request deduplication and result sharing at the framework level:

class IntervalCoordinationHook:
    def __init__(self):
        self.cache = {}
        self.locks = {}
    
    async def pre_process(self, context):
        """Coordinate interval-triggered callbacks across clients"""
        # Identify interval-triggered callbacks
        if context.triggered_id and 'interval' in context.triggered_id:
            callback_key = f"{context.callback_id}:{context.triggered_timestamp}"
            
            # Check if result already cached from another client
            if callback_key in self.cache:
                context.cached_result = self.cache[callback_key]
                context.skip_execution = True
                return context
            
            # Acquire lock to prevent parallel execution
            if callback_key not in self.locks:
                self.locks[callback_key] = asyncio.Lock()
            
            async with self.locks[callback_key]:
                # Double-check cache after acquiring lock
                if callback_key in self.cache:
                    context.cached_result = self.cache[callback_key]
                    context.skip_execution = True
                else:
                    # First request proceeds, others will wait
                    context.cache_key = callback_key
        
        return context
    
    async def post_process(self, context, result):
        """Cache result for other clients"""
        if hasattr(context, 'cache_key'):
            self.cache[context.cache_key] = result
            # Expire after interval duration
            asyncio.create_task(
                self._expire_cache(context.cache_key, context.interval_ms)
            )
        return result

For high-frequency interval callbacks, this reduces bandwidth consumption by 10-100x depending on data volatility, enabling sub-second refresh rates without overwhelming the network.

Django's async patterns: A blueprint for Dash hooks - The unified middleware pattern Django's most powerful insight is the "onion layer" pattern where middleware wraps the next component in a single callable rather than separate pre/post hooks:

class SimpleMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response  # Next middleware or view
    
    def __call__(self, request):
        # Pre-processing: executed top-down
        modified_request = self.prepare(request)
        
        response = self.get_response(modified_request)  # Pass to next layer
        
        # Post-processing: executed bottom-up
        return self.enhance(response)

Why this matters for Dash?: The unified pattern guarantees PreProcess and PostProcess operations are paired, enabling proper resource management. A hook that opens a database transaction in PreProcess can reliably close it in PostProcess even if the callback raises an exception. Separate hooks risk orphaned resources when only one executes.

class DashHook:
    def __init__(self, get_result):
        self.get_result = get_result  # Next hook or callback executor
    
    async def __call__(self, context):
        # PreProcess
        context = await self.validate_and_transform(context)
        
        # Execute callback
        result = await self.get_result(context)
        
        # PostProcess
        return await self.format_and_cache(result, context)

Django leverages Python's contextvars module for context-local state that automatically propagates across async boundaries:

import contextvars

request_id = contextvars.ContextVar('request_id')

async def middleware(request):
    token = request_id.set(generate_id())
    try:
        response = await next_middleware(request)
        return response
    finally:
        request_id.reset(token)

This pattern solves a critical problem in async Dash hooks: maintaining callback-specific state across hook chains without explicit passing. A PreProcess hook can set authentication data that's automatically available in the callback and PostProcess hook:

# In PreProcess hook
callback_user = contextvars.ContextVar('callback_user')
callback_user.set(authenticated_user)

# In callback (no changes needed)
def my_callback(value):
    user = callback_user.get()  # Automatically available
    return fetch_user_data(user)

# In PostProcess hook
user = callback_user.get()  # Same context preserved
audit_log(user, result)

Critical advantage: Unlike thread-local storage, contextvars work correctly with async/await, ensuring each callback execution maintains isolated state even when multiple callbacks run concurrently in the same event loop.

Async grouping with asyncio.gather and TaskGroup

Django's approach to concurrent operations within request processing offers direct lessons for Dash callback optimization:

async def new_contributor(request):
    # Execute multiple async operations concurrently
    is_registered, avatar, recent_posts = await asyncio.gather(
        is_email_registered(email),
        get_gravatar(email),
        fetch_recent_posts(email),
        return_exceptions=True  # Don't fail all if one fails
    )

For Dash hooks, this enables parallel execution of independent hook operations:

class MultiValidationHook:
    async def pre_process(self, context):
        """Run multiple validations concurrently"""
        validation_results = await asyncio.gather(
            self.validate_auth(context),
            self.validate_rate_limit(context),
            self.validate_input_schema(context),
            return_exceptions=True
        )
        
        # Aggregate results
        for result in validation_results:
            if isinstance(result, Exception):
                raise ValidationError(str(result))
        
        return context

Python 3.11+ TaskGroup provides structured concurrency with better error handling:

async def pre_process(self, context):
    async with asyncio.TaskGroup() as group:
        auth_task = group.create_task(validate_auth(context))
        rate_task = group.create_task(check_rate_limit(context))
        cache_task = group.create_task(fetch_cached_data(context))
    
    # Tasks automatically awaited; automatic cancellation on exception
    context.auth_result = auth_task.result()
    context.within_rate_limit = rate_task.result()
    context.cached_data = cache_task.result()
    return context

This is particularly powerful for PostProcess hooks that need to perform multiple I/O operations (logging, caching, notifications) without blocking callback completion.

pip-install-python avatar Oct 09 '25 16:10 pip-install-python

@pip-install-python

The PreProcess and PostProcess hook is available for any workflow, it is up to the dev to develop it in such a way to close connections in the result of an error. While connection pooling and keeping a connection open during the life-cycle of the callback is beneficial, I'd be unsure if you would want to pass a connection through all three phases pre, during and post.

As far as auth, you are able to do something similar utilizing the already available hook custom_data. I utilize this in my current flow.

As far as async execution, it would be possible, however the functions that would work are synchronous (using asyncio.run will cause issues if in a loop). It'd be better to allow the dev to handle the async logic inside the hook itself because you dont want it to run the callback since this can alter both the inputs and the outputs.

BSd3v avatar Oct 09 '25 16:10 BSd3v