context-mod icon indicating copy to clipboard operation
context-mod copied to clipboard

Add edge-case handling when retriving prolific user history

Open FoxxMD opened this issue 1 year ago • 0 comments

Due to limits in reddit's infra basically every search vector is limited to returning 1000 items.

The consequence of this is that when CM retrieves a user's profile history only the last 1000 submissions/comments/overview items can possibly be retrieved.

In the event the user is prolific and the fetch window is time-based it is possible we run out of items before the time window is reached. Additionally, reddit's listing response doesn't differentiate between max exhaustion and actual "no more items".

Document limitation

  • [ ] This should be added to the docs so operators/moderators are aware of the current limitation

Detect and handle max exhaustion cases

  • [ ] We should detect that 1000 is not inclusive when the window is time-based
  • [ ] Add config option to window to fail if exhausted or make do with given data?

Handle single subreddit source use-case with different search

If the window is filtering to a specific subreddit we can use a different search vector to (probably) get more results. This is because user profile search requires us to get all activities and then filter to subreddit -- as opposed to searching a subreddit only returns activities from that sub to begin with...

If max exhaustion occurs in this scenario we could fallback to getting a listing with this query:

https://www.reddit.com/r/SUBREDDIT/search/?q=author%3AUSERNAME&restrict_sr=1&sr_nsfw=&type=comment

FoxxMD avatar May 18 '23 13:05 FoxxMD