congress icon indicating copy to clipboard operation
congress copied to clipboard

Caching bug with vote task

Open pqnelson opened this issue 7 years ago • 1 comments

Hello,

So in the vote_ids_for_house() function, the basic logic is to download the index page and parse that for the matching links (line 71 et seq.).

The problem is, if I run this crawler on the night of Monday 3 April 2017, then I will get the votes up to h208-2017. The next day, at 2:30pm, if I run run votes --congress=115 --session=2017 --log=info then the crawler won't download the index for the votes, since it's been cached.

The problem: it won't pickup votes h209 through h212, which occurred since the cached index page has been downloaded.

I think the fix is to modify the options passed to the utils.download() function on line 75 to force downloading the index. (Likewise, for Senate votes, line 125 should be modified to read something like utils.merge(options, {'binary': True, 'force': True}), to avoid the same bug for Senate votes.)

pqnelson avatar Apr 04 '17 21:04 pqnelson

I always run it with --force --fast. I'd be ok with changing it so that's the default behavior.

JoshData avatar Apr 05 '17 13:04 JoshData