snpr Improve PLOS worker

It looks like the server's IP is blocked by the PLOS search api. We should contact them and improve the worker to honor the API's rate limits:

http://api.plos.org/solr/faq/#solr_api_recommended_usage

Dec 07 '13 11:12 tsujigiri

Working on the worker part.

Dec 07 '13 11:12 tsujigiri

See #98

Dec 08 '13 13:12 tsujigiri

We could also tweak the retry delay to reflect the API limits: https://github.com/mperham/sidekiq/wiki/Error-Handling

Dec 10 '13 08:12 tsujigiri

I'm still a bit confused why the PLOS-worker amasses so many jobs in the first place.. I emptied the PLOS queue yesterday, and already there are 7,828 jobs in the PLOS queue, and 0 in the SNPedia and Mendeley queue. How come?

If we fix that we should have much fewer requests for the PLOS API

Dec 11 '13 04:12 philippbayer

Might have something to do with this: https://github.com/gedankenstuecke/snpr/blob/1468394e4d1bc9c680a433d28b5e0fbbbc3006ce/app/controllers/snps_controller.rb#L32

Dec 11 '13 09:12 tsujigiri

So I guess it should be better to start these jobs only if the SNP's attributes haven't been updated in a month, like

if Date.today - 30.days > (@snp.plos_updated).to_date
    Sidekiq::Client.enqueue(PlosSearch, @snp.id)
end

Dec 12 '13 00:12 philippbayer

We already do that in the worker. What I was wondering is if, instead of looking up those that are viewed, we want to do something like looking up 7500 SNPs (daily API limit), that haven't been updated for a while, once a night and enqueue them.

Dec 12 '13 08:12 tsujigiri

Fixed in f6d13e6649b9657c70f1f8f664cde4cc09fb36df

Aug 26 '14 06:08 philippbayer

wait no that's not fixed

Aug 26 '14 06:08 philippbayer

@philippbayer any updates on this bug?

Aug 06 '16 12:08 raivivek

It's changed a little bit since then - there's now a lib/tasks/update_papers.rake which submits a worker which iterates over all papers and enqueues their job (Mendeley, Plos, Snpedia) when they haven't been updated in 31 days. Before, we just enqueued all three jobs for a SNP when a SNP's show page was opened, which clearly was way too much, especially when someone crawled all the info

Aug 07 '16 05:08 philippbayer