Stream-Framework
Stream-Framework copied to clipboard
An Example for filtering the Aggregated Activities?
How can i filter the aggregated activities? Considering aggregated activities has the custom attribute e.g. active (boolean)
feed = newsfeed_stream.get_feeds(user.id)['aggregated']
# filtering here ???
1.) Python filtering So the feed is really fast, so you might be able to get away with filtering it in python. [i for i in feed[:20] if i.is_active] Of course this gives issues with pagination.
2.) Keep two feeds One feed for active and one feed for inactive
3.) Cassandra secondary index While this is the cleanest solution its also quite hard to do
4.) Roll your own index
5.) Override the read path with a LUA script that filters the results server-side
Hi,
- NO, this won't work as you said it has issues with pagination
- No,
active
is just an example i may have multiple filtering parameters, keeping separate feeds for each is not reasonable - I am using Redis
- @tbarbugli What do you mean by
Roll your own index
and point number 5 (is point 5 dependent on point 4? if not then can you point me the code area where there is read path?)
What's the max length on your feeds?
I have not change the length yet. It is default.
Hi, my suggestions don't depend on each other (though you can use them together)
By rolling your own index I mean to store the reference of activities that are active on a feed.
The LUA approach is pretty much the same as Thierry's python filtering but done on server side.
Fastest easy to hack this in is probably by creating a subclass of RedisTimelineStorage and overriding the get_slice_from_storage (which uses feedly.storage.redis.structures.sorted_set.RedisSortedSetCache get_results method which you probably also need to subclass).
I suggest you to just dive into the RedisTimelineStorage code since its quite easy to read and navigate if you are interested in exploring this option.
I personally would go with Thierry's option #1 on steroid:
- wrap feed's class get_results method to allow filtering
- for low cardinality fields you can keep track of values' frequency (eg. 10% of the activities are active)
- use the activity_id to know exactly where to start (otherwise you always need to walk the feed from the beginning)
- once you know where the slice start, accumulate enough activities fetching multiple times more activities than requested (eg. 1.5 more activities when looking for active activities / 10x in the other case) and return the requested amount
Note: selecting k inactive activities from a feed that contains only active activities results in reading the whole feed :/
Tommaso
2014-02-04 intellisense [email protected]:
I have not change the length yet. It is default.
Reply to this email directly or view it on GitHubhttps://github.com/tschellenbach/Feedly/issues/30#issuecomment-34053741 .