bridgy icon indicating copy to clipboard operation
bridgy copied to clipboard

twitter: fully support oversized block lists

Open snarfed opened this issue 8 years ago • 2 comments

we recently started fetching and obeying twitter block lists in #473. more implementation details in snarfed/granary#40. we use twitter's blocks/ids endpoint (with count=5000), which is rate limited to 15 calls per 15m per user. this means that we can only fetch 75k members of a user's block list at one time.

there are currently three bridgy twitter users with >75k users on their block lists. one has 149996 total (!); i haven't counted the others. they now hit the rate limit and fail pretty much every poll. we cache block lists, but we don't split and coalesce block list fetches across polls.

those three users' last webmentions from bridgy were 2 mos ago, 3 mos ago, and never, and the solutions to this are awkward at best, so i'm not prioritizing this right now.

snarfed avatar Aug 09 '17 00:08 snarfed

my stopgap solution to this is to gracefully handle when we get rate limited and use the partial block list contents that we've fetched so far (06f5987af2ae1e5b361c86d4c97ae44711659a40). the cap i ended up with is actually 40k, not 75k, since i memcache block lists, and memcache values are limited to 1MB.

snarfed avatar Aug 23 '17 18:08 snarfed

oof, hit a blocklist last night (vhfmag's) where 40k was still over 1MB. error here; log:

urlopen GET https://api.twitter.com/1.1/blocks/ids.json?count=5000&stringify_ids=true&cursor=-1 {} (local/lib/python2.7/site-packages/oauth_dropins/webutil/util.py:1316)
...
urlopen GET https://api.twitter.com/1.1/blocks/ids.json?count=5000&stringify_ids=true&cursor=1645386315131274403 {} (local/lib/python2.7/site-packages/oauth_dropins/webutil/util.py:1316)
Error 429, response body: u'{"errors":[{"message":"Rate limit exceeded","code":88}]}' (local/lib/python2.7/site-packages/oauth_dropins/webutil/util.py:1107)
Updating vhfmag (Twitter) /twitter/vhfmag : {u'poll_status': u'error', u'last_activity_id': u'1175780014345863169', u'last_public_post': datetime.datetime(2019, 9, 22, 14, 32, 55), u'recent_private_posts': 0} (models.py:262)
Values may not be more than 1000000 bytes in length; received 1087968 bytes (local/lib/python2.7/site-packages/webapp2.py:1590)
Traceback (most recent call last):
...
  File "tasks.py", line 95, in post
    self.poll(source)
  File "tasks.py", line 195, in poll
    self.backfeed(source, responses, activities=activities)
  File "tasks.py", line 329, in backfeed
    if source.is_blocked(resp):
  File "twitter.py", line 172, in is_blocked
    memcache.set(cache_key, self.blocked_ids, time=BLOCKLIST_CACHE_TIME)
...
  File "python27_lib/versions/1/google/appengine/api/memcache/__init__.py", line 238, in _validate_encode_value
    'received %d bytes' % (MAX_VALUE_SIZE, len(stored_value)))
ValueError: Values may not be more than 1000000 bytes in length; received 1087968 bytes

the memcache docs are light on size calculation details, but they do say Any type. If complex, will be pickled. i guess i could pickle the list, measure, and incrementally cut it down until it's under 1MB, but for now i'm just going to drop the cutoff down to 35k.

snarfed avatar Sep 23 '19 18:09 snarfed

Obsolete, Bridgy Twitter is dead. https://github.com/snarfed/bridgy/issues/1410#issuecomment-1497763725

snarfed avatar Apr 06 '23 04:04 snarfed