backend
backend copied to clipboard
Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
On the Covid-sourdough topic (4138) we are using for the upcoming ICSWM tutorial I note [that Instagram is the most linked to source](https://topics-dev.mediacloud.org/#/topics/4138/media?focusId=&q=&snapshotId=5100×panId=975868). However, [that source (104129) is a weird...
To make some technical decisions I think we need to more concretely design the primary use cases we have in mind so far. Here's my stab at a list, and...
we can use tensorflow, keras layers and a pre-trained model (like ResNet or MobileNet) to classify story images. We can adapt pre-existing models or create our own - it depends...
I'm seeing a that stories from the LA Times have the content of the story repeated multiple times in the same story object in our database. This is a data...
A user found a story at one point, but when returning to their query later couldn't find the same one, so they emailed us. The story id in question is...
All of our crawler_fetcher and fetch_link workers were clocking on create_missing_partitions(). create_missing_partitions was blocked on an autovacuum of the stories table. I'm not sure how to fix this long term....
Like the other topic discovery plugins, we need to add a plugin for ingesting matching YouTube videos into a topic, extracting links from the description and/or comments, and saving it...
We've decided that we can retrieve a useful set of content from CrowdTangle, so we need to add a plugin that lets us discover and ingest content via their API...
We've come up with a short term idea for shifting the sitemap ingest process to researchers. The idea is that web-users could request a source's sitemaps be fetched (via [ultimate-sitemap-parser](https://github.com/berkmancenter/mediacloud-ultimate-sitemap-parser)),...
We should replace smart quotes and long dashes in solr queries at the api level.