redis-limpyd
redis-limpyd copied to clipboard
[idea] NoCollectionPKField
Sometimes, we know that we will always retrieve pk from indexes, other objects (foreign key) or other references (for example a RedisModel linked to a Django model). In these cases, the collection key is useless, and if the we have a huge set of data, this could be a leak of resources. What we can do? One idea is a "NoCollectionPKField", which will overwrite the "collection" behaviour. Another idea could be to add a "collect=False" parameter to the PKField.
Thinking to it, It seems to me that the "collection=False" parameter is a bit more convenient: this allow to use both the PKField of AutoPKField without the collection.
By the way, one question is: what to do when, using a model without collection, someone ask for its collection ? Raising ? Returning an empty set ? I'd vote for raising, for "explicit vs implicit" reasons.
i vote for raising too !
@yohanboniface @twidi Interested in how others have solved this problem. At the momnent I've used a PKField(use_collection=False)
then had to override methods on my own class PKField(fields.PKField)
etc.
Is got a better way?
A few years after, I am not sure what the idea was. To skip storing the collection in a redis set?
For simple collections it would be easy I guess, but for more complex ones, the collection on the redis side is needed because it is used to do many operations on the set.
Some time should be spent on it.
@glynjackson: any way to see your implementation?
I think my situation is typical. I had broadcasts
, a SortedSet of IDs relating to notification schedules.
For example,
class PingSchedule(BaseRedisModel):
"""
Used as a replacement to redis-limpyd standard collection allowing the use of a 'SortedSet' lookup.
Used in celery tasks to look-up the broadcast based on timestamp given.
"""
limpyd_collection = False
id = PKField(unique=True)
broadcasts = SortedSetField()
This results in:
:pingschedule:bucket1:broadcasts :pingbroadcast:collection
Where :pingbroadcast:collection
is never used again, and only ever contains 1 item. I only append/delete from broadcasts in bucket 1.
For example I used it like this:
Tasks.py
schedule = PingSchedule(schedule_node_location_one).broadcasts.zrangebyscore(0, present)
for broadcast_id in schedule:
broadcast = PingBroadcast(broadcast_id)
content_object = broadcast.get_content_object()
etc...
So as I know the bucket ID all I need in this case is:
:pingschedule:bucket1:broadcasts
On models where I did not want a collection to be created I add limpyd_collection=False
, where BaseRedisModel
and Feilds look for the property.
For example one override is:
class PKField(fields.PKField):
def set(self, value):
"""
Override the default setter to check uniqueness, deny updating, and add
the new pk to the model's collection only if model uses a collection.
Note:
The value is not saved as a field in redis, because we don't need this.
On an instance, we have the _pk attribute with the pk value, and when
we ask for a collection, we get somes pks which can be used to instanciate
new objects (holding the pk value in _pk)
"""
# Deny updating of an already set pk
if self._set:
raise ValueError('A primary key cannot be updated')
# Validate and return the value to be used as a pk
value = self._validate(value)
# Tell the model the pk is now set
self._instance._set_pk(value)
self._set = True
# Only create a collection if 'use_collection' is set on the redis model.
if self._instance.has_collection():
# We have a new pk, so add it to the collection
self.connection.sadd(self.collection_key, value)
# Finally return 1 as we did a real redis call to the set command
return 1
This took a while on anything that creates a collection.
@twidi Sorry, I did not look at the date! My bad! But, I guess it's still a requirement. There are lots of times where you already know the ID and don't need a collection set created in limpyd, again! I've only been looking at this today, so I'm not familiar with the package.
@twidi , sorry to fill up your inbox :) Looking at this more closely, my example is trying to replicate creating a collection in limpyd. I think it would be more appropriate to have something like InstanceHashCollectionField
and InstanceHashSortedCollectionField
where ONLY a collection set is created i.e.:
class PingSchedule(BaseRedisModel):
id = PKField(unique=True)
collection = InstanceHashCollectionField
Redis:
PingSchedule:ID:collection
The PKField
is used only in the creation of the collection in Redis. I'm sure then it would allow the use of the collection manager for lookups.
Python:
PingSchedule(ID).collection().sort()
Would something like this be better, possible?
OH I think I understand know, it's only about the set that stores all the PKs of objects of a model? (which is by the way also used for collections filtering)
Yep, sorry about my long explanation 👎 . It's because of the dependency on filtering my first example would be a bad idea. However, if we had something like InstanceHashCollectionField
and InstanceHashSortedCollectionField
where only a collection is created I'm sure this would not upset filtering too much, right?
I have no time to dig into this right now, sorry :-/
About the filtering, I don't remember if the filtering uses the default collection or not.
Also, although I use limpyd very often, I didn't touch the (complex) internals for a while so I'm not able to answer in a more advanced way before intensively digging through the code
@twidi Thanks regardless, it's a fantastic package. I've only spent a small amount of time digging into it anyhow. I'm sure I can find a good solution.
Thank you for your kind words :) And good luck!
Turns out to be easy. I've added a new property called limpyd_collection
which dictates if limpyd should create a collection in Redis. Here is a basic example of how I implemented it in the end.
I added a base model which extends RedisModel
with a default value for limpyd_collection
(used to determan if limpyd should create a collection set in redis):
class BaseRedisModel(model.RedisModel):
limpyd_collection = True
Example of 2 modles, collection and non-collection option:
NoCollectionInstance:
class NoCollectionInstance(BaseRedisModel):
limpyd_collection = False
id = PKField(unique=True)
broadcasts = InstanceHashSortedCollectionField()
CollectionInstance:
class CollectionInstance(BaseRedisModel):
limpyd_collection = True # can be omitted
id = PKField(unique=True)
broadcasts = InstanceHashSortedCollectionField()
Next, override PKField
to stop the creation of a collection set. I extended PKField
to allow the following:
class PKField(fields.PKField):
def set(self, value):
# Deny updating of an already set pk
if self._set:
raise ValueError('A primary key cannot be updated')
value = self._validate(value)
self._instance._set_pk(value)
self._set = True
# Only create a collection if 'limpyd_collection' is set true.
if self._instance.limpyd_collection:
# We have a new pk, so add it to the collection
self.connection.sadd(self.collection_key, value)
return 1
This allows:
NoCollectionInstance(id='1', broadcasts=[timestamp, self.id]) CollectionInstance(id='2', broadcasts=[timestamp, self.id])
Next, limpyd checks for a PK using the collection set stored in Redis, since my NoCollectionInstance
model does not have one this will fail.
class BaseRedisModel(model.RedisModel):
limpyd_collection = True
@classmethod
def exists(cls, **kwargs):
if not kwargs:
raise ValueError(u"`Exists` method requires at least one kwarg.")
# special case to check for a simple pk
if len(kwargs) == 1 and cls._field_is_pk(list(kwargs.keys())[0]):
# Match using standard pk in collection.
if cls.limpyd_collection:
return cls.get_field('pk').exists(list(kwargs.values())[0])
else:
# Macth custom collection, skip.
# TODO: add custom macth custom collection.
return True
try:
cls.collection(**kwargs).sort(by='nosort')[0]
except IndexError:
return False
else:
return True
Above allows the following without any errors:
NoCollectionInstance(id='1') CollectionInstance(id='2')
or any gets,
NoCollectionInstance(id='1').broadcasts.zrangebyscore(0,0)) CollectionInstance(id='2').broadcasts.zrangebyscore(0,0))
or deletes,
NoCollectionInstance(id='1').delete() CollectionInstance(id='2').delete()
Last, is the use of the collection manager. Initially, the collection manager was somewhat confusing. I still wanted to use a manager on limpyd_collection=Flase
models; this just required a custom manager to work.