Stream-Framework icon indicating copy to clipboard operation
Stream-Framework copied to clipboard

An Example for filtering the Aggregated Activities?

Open intellisense opened this issue 10 years ago • 6 comments

How can i filter the aggregated activities? Considering aggregated activities has the custom attribute e.g. active (boolean)

feed = newsfeed_stream.get_feeds(user.id)['aggregated']
# filtering here ???

intellisense avatar Feb 03 '14 23:02 intellisense

1.) Python filtering So the feed is really fast, so you might be able to get away with filtering it in python. [i for i in feed[:20] if i.is_active] Of course this gives issues with pagination.

2.) Keep two feeds One feed for active and one feed for inactive

3.) Cassandra secondary index While this is the cleanest solution its also quite hard to do

tschellenbach avatar Feb 04 '14 09:02 tschellenbach

4.) Roll your own index

5.) Override the read path with a LUA script that filters the results server-side

tbarbugli avatar Feb 04 '14 09:02 tbarbugli

Hi,

  1. NO, this won't work as you said it has issues with pagination
  2. No, active is just an example i may have multiple filtering parameters, keeping separate feeds for each is not reasonable
  3. I am using Redis
  4. @tbarbugli What do you mean by Roll your own index and point number 5 (is point 5 dependent on point 4? if not then can you point me the code area where there is read path?)

intellisense avatar Feb 04 '14 11:02 intellisense

What's the max length on your feeds?

tschellenbach avatar Feb 04 '14 12:02 tschellenbach

I have not change the length yet. It is default.

intellisense avatar Feb 04 '14 12:02 intellisense

Hi, my suggestions don't depend on each other (though you can use them together)

By rolling your own index I mean to store the reference of activities that are active on a feed.

The LUA approach is pretty much the same as Thierry's python filtering but done on server side.

Fastest easy to hack this in is probably by creating a subclass of RedisTimelineStorage and overriding the get_slice_from_storage (which uses feedly.storage.redis.structures.sorted_set.RedisSortedSetCache get_results method which you probably also need to subclass).

I suggest you to just dive into the RedisTimelineStorage code since its quite easy to read and navigate if you are interested in exploring this option.

I personally would go with Thierry's option #1 on steroid:

  1. wrap feed's class get_results method to allow filtering
  2. for low cardinality fields you can keep track of values' frequency (eg. 10% of the activities are active)
  3. use the activity_id to know exactly where to start (otherwise you always need to walk the feed from the beginning)
  4. once you know where the slice start, accumulate enough activities fetching multiple times more activities than requested (eg. 1.5 more activities when looking for active activities / 10x in the other case) and return the requested amount

Note: selecting k inactive activities from a feed that contains only active activities results in reading the whole feed :/

Tommaso

2014-02-04 intellisense [email protected]:

I have not change the length yet. It is default.

Reply to this email directly or view it on GitHubhttps://github.com/tschellenbach/Feedly/issues/30#issuecomment-34053741 .

tbarbugli avatar Feb 04 '14 21:02 tbarbugli