internal-displacement icon indicating copy to clipboard operation
internal-displacement copied to clipboard

Studying news events and internal displacement.

Results 22 internal-displacement issues
Sort by recently updated
recently updated
newest added

Can we extract items such as the title and date published from a pdf?

enhancement
scraper

Take approach from `classification` notebook and integrate into interpreter for classification and filtering articles.

interpreter

During scraping, can we tag whether something is text/video/image/pdf. Extra dessert if you can discern between news/blog etc.

data-collection
scraper

This code in `master` breaks production: ``` //if not using docker //create a pgConfig.js file in the same directory and put your credentials there const connectionObj = require('./pgConfig'); ``` ```...

The `docker-compose.yml` and `docker.env` files are currently set up with local development in mind. We'll want a production-friendly config. - Don't run localdb - DB config refers to AWS RDS...

infrastructure

Write a function that calculates the percentage of missing fields in `report.Report` after an article has been interpreted. We may expand this later to include weighting or other factors. Discussion...

beginner-friendly
interpreter

Here's a sketch of an infrastructure plan: ## Development Scrapers run locally (on developer machine) in Docker for prototyping (internal-displacement repo) Write to local DB in docker Can read scrape...

infrastructure

In `Pipeline.process_url` we make multiple calls to `article.update_status()`. The update_status method may raise `UnexpectedArticleStatusException` if it appears that the status has been changed in the meantime. `process_url` should be prepared...

enhancement
pipeline

Make sure pipeline is working with pdf articles for different scenarios: - Non existent / broken url - Non English - Irrelevant - Relevant Ideally include some tests in `tests/test_Pipeline.py`

pipeline

Write a function in `article.Article` that calculates the percentage of scraped fields which are returned empty. We may consider expanding the definition of scraping reliability later, so suggestions welcome.

beginner-friendly
interpreter