dogpile.cache icon indicating copy to clipboard operation
dogpile.cache copied to clipboard

Option to fail quickly instead of blocking when initial generation is happening?

Open sqlalchemy-bot opened this issue 10 years ago • 12 comments

Migrated issue, originally created by Marc Abramowitz (msabramo)

Current behavior is that if the cache key is empty or invalidated, the first worker will start regenerating and other workers will block until the first worker releases the dogpile lock. (for an expired key, the workers don't block; they return the expired value).

I'm wondering if it makes sense to have an option to make the workers fail quickly instead of blocking. I hate to tie up Python workers that are just waiting for a lock.

I'm doing it in app code now by essentially doing a get and checking for NO_VALUE and if NO_VALUE is there, I call set with a little busy error response and then call the original function and then cache the value that comes back from it. So the first worker will go through this and will set the little busy error response so that other workers will see that until the regen is complete and the new value is cached.

I'm wondering if a pattern like this could be put in dogpile.cache?

I'd be willing to take a stab at a PR. I just want to make sure it's something that you might be interested in before I start coding. Or maybe there is already an even easier pattern to do this?

sqlalchemy-bot avatar Aug 16 '14 07:08 sqlalchemy-bot

Marc Abramowitz (msabramo) wrote:

I'm imagining maybe something where in addition to get_or_create there is get_or_set -- instead of taking a creator_func, it takes a default value to stick in the cache that expires immediately.

sqlalchemy-bot avatar Aug 16 '14 17:08 sqlalchemy-bot

Marc Abramowitz (msabramo) wrote:

Hmmm, maybe get_or_set is the wrong semantics, because we want a code path where the first worker calls the creator_func and subsequent workers return the default value (much like what happens with expired values).

So now I'm thinking that there could be a default_value parameter to get_or_create.

sqlalchemy-bot avatar Aug 16 '14 17:08 sqlalchemy-bot

Michael Bayer (zzzeek) wrote:

so....you still want the error raise? if so, you can inject this just by using a custom lock with a zero-length timeout.

default_value sounds a little more workable.

sqlalchemy-bot avatar Aug 16 '14 18:08 sqlalchemy-bot

Marc Abramowitz (msabramo) wrote:

Yeah I think default_value might be the best.

Going to experiment a little with my app here. Things are a little complex here because we use an internal library on top of dogpile.cache (smlib.cache). That library is wrapping dogpile.cache and adding a couple of goodies (we may want to discuss with @sontek about which things in that library could go upstream into dogpile.cache, if you want them).

sqlalchemy-bot avatar Aug 16 '14 18:08 sqlalchemy-bot

Marc Abramowitz (msabramo) wrote:

What does the custom lock solution look like? Is it worth add to the recipes section of the docs?

sqlalchemy-bot avatar Aug 18 '14 13:08 sqlalchemy-bot

Michael Bayer (zzzeek) wrote:

somehting like

#!python

class GiveUpLock(object):
    def __init__(self):
        self.lock = threading.Lock()

    def acquire(self, wait=True):
        if not self.lock.acquire(wait):
            # add time.sleep() here, a loop, if you want a timeout of 
            # .5 sec or something 
            raise Exception("couldn't get lock")!

sqlalchemy-bot avatar Aug 18 '14 14:08 sqlalchemy-bot

We are also very interested in this feature, thread is quite old, will you accept PR that makes this configurable on existing interface? Do you see any obstacles or challenges to implement it ?

bagerard avatar Jan 22 '21 10:01 bagerard

hey there -

which feature, the "default_value" argument? We can accept PRs for that sure. the part about the lock raising an error here I'm not following it right now as I dont work on dogpile that much.

zzzeek avatar Jan 22 '21 14:01 zzzeek

Our (simplified) use case is: Frontend calls HTTP endpoint --> Backend checks if value is in cache, if yes: return it, if not push a job to celery and return HTTP 102 (Processing) and have frontend retrying the same later on

In that design we have a chance of having multiple celery worker active on the same task - 1 worker acquiring the lock and doing the work, and other workers just waiting which is not nice as they could be doing other work.

Thus our preferred scenario is that in case a celery worker can't acquire the lock immediately, we would let it fail with a given error, knowing that it means that we have another worker already handling the task.

From the interface, I'd imagine something like region.get_or_create(..., immediate_lock_or_raise=True)

and it would either acquire the lock and do the processing, or fail abruptly with CouldNotImmediateLock()

I didn't really understand what was explained above with the default_value, is it the same principle (i.e no blocking lock) but with returning a default value instead of raising an exception or am I missing something?

bagerard avatar Jan 25 '21 22:01 bagerard

Frontend calls HTTP endpoint --> Backend checks if value is in cache, if yes: return it, if not push a job to celery and return HTTP 102 (Processing) and have frontend retrying the same later on

get_or_create will always block when the current thread is chosen to generate the value, even if async_creation_runner is present, if there is no value in the cache. you wouldn't be able to return the 102 status. So I'm not sure how immediate_lock_or_raise suits that case.

I'm trying to think of how to do what you want using region directly and im not coming up with it, because really what you want to do looks like this:

  1. get value + time cached from cache (that is, call the backend directly)
  2. if time cached is expired, check if celery task is running for key, spin up celery task if not, return value
  3. if no value is in cache, check if celery task is running for key, spin up celery task if not, return HTTP 102

maybe we need a new kind of exception class that can be raised by the creator, called "Generating". your creator would spin the celery task and then raise "Generating", and get_or_create just raises that if there was no cache value present. you would still need to be using async_creation_runner for this.

I haven't looked at how async_creator works in a long time but take a look at it since if you want the "lock is set but the task is asynchronous" thing to happen it would have to be involved here.

zzzeek avatar Jan 26 '21 00:01 zzzeek

I believe the notion of "default_value" is essentially the same, or at least can be adapted to be the same as, the "Generating" exception I referred towards above. I think a better name for this parameter might be "interim_value", something like that. here's how it would work for your case, noting the use of pseduocode to indicate where your celery task would run:


def async_creation_runner(cache, somekey, creator, mutex):
    run_as_celery_task(
        create_the_value=creator,
        populate_into_the_cache_using_key=somekey,
        release_this_mutex_when_done=mutex
    )


region = make_region(
    async_creation_runner=async_creation_runner,
).configure(
    'dogpile.cache.memcached',
)

return_102 = object()   # a marker object

def serve_request(key):
    value = region.get_or_create(key, create_my_value, interim_value=return_102)
    if value is return_102:
        return http_response(102)
    else:
        return http_response(value)

above, the contract of "interim_value" is:

  1. the async_creation_runner must be set for the region. This allows the region to run the "creator" function in a background thread/process/etc. in a manner defined by the end user
  2. when get_or_create() is called, and a value is not available, the "interim_value" is returned instead. a. if the dogpile lock is available, it is acquired and the async creator is invoked b. if the dogpile lock is taken, nothing else is done as the creator is still running

the difference w/ interim_value is that without it, right now dogpile doesn't do the "async" creator if no value is present.

zzzeek avatar Jan 26 '21 15:01 zzzeek

Thanks for the feedback, I'll look have a closer look at that part of the code before getting back to this

bagerard avatar Jan 26 '21 21:01 bagerard