Improve PLOS worker
It looks like the server's IP is blocked by the PLOS search api. We should contact them and improve the worker to honor the API's rate limits:
http://api.plos.org/solr/faq/#solr_api_recommended_usage
Working on the worker part.
See #98
We could also tweak the retry delay to reflect the API limits: https://github.com/mperham/sidekiq/wiki/Error-Handling
I'm still a bit confused why the PLOS-worker amasses so many jobs in the first place.. I emptied the PLOS queue yesterday, and already there are 7,828 jobs in the PLOS queue, and 0 in the SNPedia and Mendeley queue. How come?
If we fix that we should have much fewer requests for the PLOS API
Might have something to do with this: https://github.com/gedankenstuecke/snpr/blob/1468394e4d1bc9c680a433d28b5e0fbbbc3006ce/app/controllers/snps_controller.rb#L32
So I guess it should be better to start these jobs only if the SNP's attributes haven't been updated in a month, like
if Date.today - 30.days > (@snp.plos_updated).to_date
Sidekiq::Client.enqueue(PlosSearch, @snp.id)
end
We already do that in the worker. What I was wondering is if, instead of looking up those that are viewed, we want to do something like looking up 7500 SNPs (daily API limit), that haven't been updated for a while, once a night and enqueue them.
Fixed in f6d13e6649b9657c70f1f8f664cde4cc09fb36df
wait no that's not fixed
@philippbayer any updates on this bug?
It's changed a little bit since then - there's now a lib/tasks/update_papers.rake which submits a worker which iterates over all papers and enqueues their job (Mendeley, Plos, Snpedia) when they haven't been updated in 31 days.
Before, we just enqueued all three jobs for a SNP when a SNP's show page was opened, which clearly was way too much, especially when someone crawled all the info