reader
reader copied to clipboard
Different feed update frequencies
#307 needs a way of changing the feed update frequency. Just like https://github.com/lemon24/reader/issues/96#issuecomment-1236304134, this should be a configurable global/per-feed strategy.
Use cases:
- different recurring frequencies (1h, 6h, 1d); a naive #307 implementation could bump the feed up to the next less frequent value
- no earlier than X; can be used by #307 to react to Retry-After
- only new feeds; special case that already exists for
update_feeds(new=True) - never/infinity; special case that already exists for
update_feeds(updates_enabled=True)- although, updates_enabled should be independent (e.g. if I temporarily disable updates, the configured frequency should not be deleted)
- maybe adding some kind of jitter to regular updates would be nice / polite
A unified way of updating feeds would be nice as well, e.g. "just call this every minute". Currently, I'm running the following in a cron:
- every hour:
update - every minute
update --new-only; search update
Logic is easy enough (turning update_after etc. to/from seconds not shown):
import random
from dataclasses import dataclass
@dataclass
class Feed:
url: str
update_period: int
update_after: int = 0
# last_updated is only set when the feed is actually updated
# (not when it's not modified, not when there was an exception)
# https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L276
# https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L445
last_retrieved: int = 0
def get_feeds_for_update(feeds, now):
return [f for f in feeds if f.update_after <= now]
def next_period(feed, now, jitter_ratio=0):
jitter = random.random() * jitter_ratio
current_period_no = now // feed.update_period
return (current_period_no + 1 + jitter) * feed.update_period
def update_feeds(feeds, now, get_update_after=next_period):
to_update = get_feeds_for_update(feeds, now)
for feed in to_update:
feed.last_retrieved = now
feed.update_after = get_update_after(feed, now)
return to_update
def set_update_period(feed, update_period):
feed.update_period = update_period
feed.update_after = next_period(feed, feed.last_retrieved)
Tests:
from collections import Counter
from functools import partial
import pytest
@pytest.mark.parametrize('old_after, new_after, now', [
(0, 10, 0),
(0, 10, 1),
(0, 10, 9.999),
(0, 20, 10),
(5, 10, 5),
(10, 20, 10),
(105, 110, 109),
(105, 120, 110),
(105, 200, 199.999),
(105, 210, 200),
])
def test_update(old_after, new_after, now):
feeds = [Feed('one', 10, old_after)]
assert len(update_feeds(feeds, now)) == 1
assert feeds == [Feed('one', 10, new_after, now)]
@pytest.mark.parametrize('old_after, now', [
(5, 4),
(10, 9.999),
(20, 19),
])
def test_no_update(old_after, now):
feeds = [Feed('one', 10, old_after)]
assert len(update_feeds(feeds, now)) == 0
assert feeds == [Feed('one', 10, old_after)]
@pytest.mark.parametrize('get_update_after', [
next_period,
# jitter ratio less than 10-1, to account for time step
partial(next_period, jitter_ratio=.9),
])
def test_sweep(get_update_after):
feeds = [Feed('one', 10), Feed('two', 20), Feed('three', 100)]
counts = Counter()
for now in range(100):
for feed in update_feeds(feeds, now, get_update_after):
counts[feed.url] += 1
assert counts == {'one': 10, 'two': 5, 'three': 1}
def test_set_period_up():
feeds = [Feed('one', 10)]
update_feeds(feeds, 5)
set_update_period(feeds[0], 20)
# no update needed, already updated in this period
assert len(update_feeds(feeds, 15)) == 0
def test_set_period_down():
feeds = [Feed('one', 20)]
update_feeds(feeds, 5)
set_update_period(feeds[0], 10)
# update needed, if taking new period into account
assert len(update_feeds(feeds, 15)) == 1
On to the API!
Update: We can get rid of last_retrieve and rely on the current time in set_update_period(); all tests pass with minimal changes.
API
Add update_feeds(scheduled: bool | None = None) argument that filters feeds to update:
- if none: if global tag
.reader.updateis set (regardless of value), assume true, else assume false - if true: update only
update_after < now - if false: update everything (no
update_afterfiltering)
In reader 4.0 (#291), we can make scheduled default to True (or just change the default behavior).
To configure the update interval, we can use a .reader.update tag:
- can be set globally or per feed, feed overrides global overrides default
- format:
interval: int(seconds),jitter: float|bool(in [0,1])- for interval, we could also use cron expressions, or at least
@hourly, @daily, @weekly, @monthlyvalues
- for interval, we could also use cron expressions, or at least
- default:
{interval: 3600, jitter: 0.25}
Using tags is good because it allows configuring stuff without additional UI.
Possible cons (WIP):
- We need to find a way to reset
update_afterwhen the update interval changes (a dedicatedset_feed_update_period()method can take care of this out of the box). One solution may be being able to register hooks on tag updates. - We want to validate the config. How do we handle validation errors? Ignore / use default, or raise to the user? The latter may be accomplished with a hook, although a standard validation mechanism would be better.
In the (low level) storage API:
- add
update_afterandlast_retrievedto FeedUpdateIntent and FeedForUpdate get_feeds_for_update(after: datetime | None)(also returns nulls)newuseslast_updated, it should uselast_retrieved
set_feed_update_after()(used when setting the interval)
To do (minimal):
- [x] add update_after and last_retrieved on Feed, FeedUpdateIntent, and FeedForUpdate
- [x] get_feeds_for_update(after)
- [x] (new should use last_retrieved)
- [x] update_feeds(scheduled) / get_feeds(scheduled) / get_feed_counts(scheduled)
- [x] update logic (just ignore invalid tag values and use default)
- [x] expose scheduled in the CLI
- [x] make the cli_status plugin entries make sense in a --scheduled world
- [x] tests
- documentation
- [x] docstrings
- [x] user guide
- [x] changelog, dev notes
Later:
- reset update_after after interval changes / set_feed_update_after()
- scheduled=None (default to false initially)
- config validation
update_after and last_retrieved should go on FeedUpdateIntent, and in turn FeedUpdateIntent should have union of FeedData-with-extra-stuff or exception or None, but I'm having trouble finding a good name for FeedData-with-extra-stuff.
For reference, here's all the feed-related data classes and how they're used:
.-- FeedForUpdate ---.
v |
parser |
| |
ParsedFeed storage -.
(has FeedData) ^ |
v | |
updater | |
| | |
|- FeedUpdateIntent -' Feed
| (has ??? (has FeedData) |
| or ExceptionInfo |
| or None) |
| v
'---- UpdateResult -----> user
(has UpdatedFeed)