redis-limpyd [idea] NoCollectionPKField

Sometimes, we know that we will always retrieve pk from indexes, other objects (foreign key) or other references (for example a RedisModel linked to a Django model). In these cases, the collection key is useless, and if the we have a huge set of data, this could be a leak of resources. What we can do? One idea is a "NoCollectionPKField", which will overwrite the "collection" behaviour. Another idea could be to add a "collect=False" parameter to the PKField.

Jun 22 '12 08:06 yohanboniface

Thinking to it, It seems to me that the "collection=False" parameter is a bit more convenient: this allow to use both the PKField of AutoPKField without the collection.

By the way, one question is: what to do when, using a model without collection, someone ask for its collection ? Raising ? Returning an empty set ? I'd vote for raising, for "explicit vs implicit" reasons.

Jun 24 '12 13:06 yohanboniface

i vote for raising too !

Jun 29 '12 22:06 twidi

@yohanboniface @twidi Interested in how others have solved this problem. At the momnent I've used a PKField(use_collection=False) then had to override methods on my own class PKField(fields.PKField) etc.

Is got a better way?

May 12 '16 09:05 glynjackson

A few years after, I am not sure what the idea was. To skip storing the collection in a redis set?

For simple collections it would be easy I guess, but for more complex ones, the collection on the redis side is needed because it is used to do many operations on the set.

Some time should be spent on it.

@glynjackson: any way to see your implementation?

May 12 '16 10:05 twidi

I think my situation is typical. I had broadcasts, a SortedSet of IDs relating to notification schedules.

For example,


class PingSchedule(BaseRedisModel):
    """
    Used as a replacement to redis-limpyd standard collection allowing the use of a 'SortedSet' lookup.
    Used in celery tasks to look-up the broadcast based on timestamp given.
    """
    limpyd_collection = False
    id = PKField(unique=True)
    broadcasts = SortedSetField()

This results in:

:pingschedule:bucket1:broadcasts :pingbroadcast:collection

Where :pingbroadcast:collection is never used again, and only ever contains 1 item. I only append/delete from broadcasts in bucket 1.

For example I used it like this:

Tasks.py
schedule = PingSchedule(schedule_node_location_one).broadcasts.zrangebyscore(0, present)
for broadcast_id in schedule:
            broadcast = PingBroadcast(broadcast_id)
            content_object = broadcast.get_content_object()
           etc...

So as I know the bucket ID all I need in this case is:

:pingschedule:bucket1:broadcasts

On models where I did not want a collection to be created I add limpyd_collection=False, where BaseRedisModel and Feilds look for the property.

For example one override is:

class PKField(fields.PKField):
    def set(self, value):
        """
        Override the default setter to check uniqueness, deny updating, and add
        the new pk to the model's collection only if model uses a collection.

            Note:
                The value is not saved as a field in redis, because we don't need this.
                On an instance, we have the _pk attribute with the pk value, and when
                we ask for a collection, we get somes pks which can be used to instanciate
                new objects (holding the pk value in _pk)

        """
        # Deny updating of an already set pk
        if self._set:
            raise ValueError('A primary key cannot be updated')

        # Validate and return the value to be used as a pk
        value = self._validate(value)

        # Tell the model the pk is now set
        self._instance._set_pk(value)
        self._set = True

        # Only create a collection if 'use_collection' is set on the redis model.
        if self._instance.has_collection():
            # We have a new pk, so add it to the collection
            self.connection.sadd(self.collection_key, value)

        # Finally return 1 as we did a real redis call to the set command
        return 1

This took a while on anything that creates a collection.

May 12 '16 11:05 glynjackson

@twidi Sorry, I did not look at the date! My bad! But, I guess it's still a requirement. There are lots of times where you already know the ID and don't need a collection set created in limpyd, again! I've only been looking at this today, so I'm not familiar with the package.

May 12 '16 11:05 glynjackson

@twidi , sorry to fill up your inbox :) Looking at this more closely, my example is trying to replicate creating a collection in limpyd. I think it would be more appropriate to have something like InstanceHashCollectionField and InstanceHashSortedCollectionField where ONLY a collection set is created i.e.:


class PingSchedule(BaseRedisModel):

    id = PKField(unique=True)
    collection = InstanceHashCollectionField

Redis:

PingSchedule:ID:collection

The PKField is used only in the creation of the collection in Redis. I'm sure then it would allow the use of the collection manager for lookups.

Python:

PingSchedule(ID).collection().sort()

Would something like this be better, possible?

May 12 '16 11:05 glynjackson

OH I think I understand know, it's only about the set that stores all the PKs of objects of a model? (which is by the way also used for collections filtering)

May 12 '16 11:05 twidi

Yep, sorry about my long explanation 👎 . It's because of the dependency on filtering my first example would be a bad idea. However, if we had something like InstanceHashCollectionField and InstanceHashSortedCollectionField where only a collection is created I'm sure this would not upset filtering too much, right?

May 12 '16 11:05 glynjackson

I have no time to dig into this right now, sorry :-/

About the filtering, I don't remember if the filtering uses the default collection or not.

Also, although I use limpyd very often, I didn't touch the (complex) internals for a while so I'm not able to answer in a more advanced way before intensively digging through the code

May 12 '16 12:05 twidi

@twidi Thanks regardless, it's a fantastic package. I've only spent a small amount of time digging into it anyhow. I'm sure I can find a good solution.

May 12 '16 12:05 glynjackson

Thank you for your kind words :) And good luck!

May 12 '16 13:05 twidi

Turns out to be easy. I've added a new property called limpyd_collection which dictates if limpyd should create a collection in Redis. Here is a basic example of how I implemented it in the end.

I added a base model which extends RedisModel with a default value for limpyd_collection (used to determan if limpyd should create a collection set in redis):

class BaseRedisModel(model.RedisModel):
       limpyd_collection = True

Example of 2 modles, collection and non-collection option:

NoCollectionInstance:

class NoCollectionInstance(BaseRedisModel):
    limpyd_collection = False
    id = PKField(unique=True)
    broadcasts = InstanceHashSortedCollectionField()

CollectionInstance:

class CollectionInstance(BaseRedisModel):
    limpyd_collection = True # can be omitted
    id = PKField(unique=True)
    broadcasts = InstanceHashSortedCollectionField()

Next, override PKField to stop the creation of a collection set. I extended PKField to allow the following:

class PKField(fields.PKField):
    def set(self, value):
        # Deny updating of an already set pk
        if self._set:
            raise ValueError('A primary key cannot be updated')
        value = self._validate(value)
        self._instance._set_pk(value)
        self._set = True
        # Only create a collection if 'limpyd_collection' is set true.
        if self._instance.limpyd_collection:
            # We have a new pk, so add it to the collection
            self.connection.sadd(self.collection_key, value)
        return 1

This allows:

NoCollectionInstance(id='1', broadcasts=[timestamp, self.id]) CollectionInstance(id='2', broadcasts=[timestamp, self.id])

Next, limpyd checks for a PK using the collection set stored in Redis, since my NoCollectionInstance model does not have one this will fail.

class BaseRedisModel(model.RedisModel):
       limpyd_collection = True

    @classmethod
    def exists(cls, **kwargs):
        if not kwargs:
            raise ValueError(u"`Exists` method requires at least one kwarg.")
        # special case to check for a simple pk
        if len(kwargs) == 1 and cls._field_is_pk(list(kwargs.keys())[0]):
            # Match using standard pk in collection.
            if cls.limpyd_collection:
                return cls.get_field('pk').exists(list(kwargs.values())[0])
            else:
            # Macth custom collection, skip.
            # TODO: add custom macth custom collection.
                return True
        try:
            cls.collection(**kwargs).sort(by='nosort')[0]
        except IndexError:
            return False
        else:
            return True

Above allows the following without any errors:

NoCollectionInstance(id='1') CollectionInstance(id='2')

or any gets,

NoCollectionInstance(id='1').broadcasts.zrangebyscore(0,0)) CollectionInstance(id='2').broadcasts.zrangebyscore(0,0))

or deletes,

NoCollectionInstance(id='1').delete() CollectionInstance(id='2').delete()

Last, is the use of the collection manager. Initially, the collection manager was somewhat confusing. I still wanted to use a manager on limpyd_collection=Flase models; this just required a custom manager to work.

May 17 '16 11:05 glynjackson

redis-limpyd redis-limpyd copied to clipboard

[idea] NoCollectionPKField

redis-limpyd
redis-limpyd copied to clipboard