Hal Roberts comments

Results 39 comments of


                                            Hal Roberts

In attach_story_data_to_stories(), don't overwrite existing fields

looks good to me. nice catch.

treat YT videos encountered when spidering as special content

just noting here that once we assigning youtube channels to media sources, we could actually create a meaningful first class youtube topic by creating a 'url sharing' topic that just...

slow solr imports

the jobs were still dying every 20 minutes or so, so I decreased the job size to 20k. it is catching up now. On Tue, Nov 26, 2019 at 1:23...

replace 're' module with 'regex' module

The updated code does not use the tokenize module at all. It just uses a regex including \w and punctuation relevant to solr queries: https://github.com/berkmancenter/mediacloud/blob/master/mediacloud/mediawords/solr/query.py#L696 The solr queries are much...

topics/stories/list is too slow

The top level stories/list call is good enough not to timeout now. the lists for individual media sources is still very slow for large topics.

add support for new associated press api

You should put the api key into the configuration file (mediawords.yml) and access it via mediawords.util.config.get_config(). You should just return all of the stories each time. I will deal with...

add support for new associated press api

we also have access to associated press support folks if we need to ask questions.

add support for new associated press api

just get the max 100 you can get from a single page. the crawler will be running this function up to every five minutes depending on how often it returns...

add support for new associated press api

this looks great. On Wed, May 15, 2019 at 11:07 PM Jason Michael Baumgartner < [email protected]> wrote: > The code is working well. Here is an example of the output...

add support for new associated press api

nice work! A couple things, with apologies for taking so long to fiddle to get the integration right. I think we're very close to being to plug this in. *...