Hal Roberts
Hal Roberts
looks good to me. nice catch.
just noting here that once we assigning youtube channels to media sources, we could actually create a meaningful first class youtube topic by creating a 'url sharing' topic that just...
the jobs were still dying every 20 minutes or so, so I decreased the job size to 20k. it is catching up now. On Tue, Nov 26, 2019 at 1:23...
The updated code does not use the tokenize module at all. It just uses a regex including \w and punctuation relevant to solr queries: https://github.com/berkmancenter/mediacloud/blob/master/mediacloud/mediawords/solr/query.py#L696 The solr queries are much...
The top level stories/list call is good enough not to timeout now. the lists for individual media sources is still very slow for large topics.
You should put the api key into the configuration file (mediawords.yml) and access it via mediawords.util.config.get_config(). You should just return all of the stories each time. I will deal with...
we also have access to associated press support folks if we need to ask questions.
just get the max 100 you can get from a single page. the crawler will be running this function up to every five minutes depending on how often it returns...
this looks great. On Wed, May 15, 2019 at 11:07 PM Jason Michael Baumgartner < [email protected]> wrote: > The code is working well. Here is an example of the output...
nice work! A couple things, with apologies for taking so long to fiddle to get the integration right. I think we're very close to being to plug this in. *...