internal-displacement
internal-displacement copied to clipboard
Studying news events and internal displacement.
Trace ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/George/miniconda3/envs/d4d-internal-displacement/lib/python3.6/http/client.py in _get_chunk_left(self) 545 try: --> 546 chunk_left = self._read_next_chunk_size() 547 except ValueError: /Users/George/miniconda3/envs/d4d-internal-displacement/lib/python3.6/http/client.py in _read_next_chunk_size(self) 512 try: --> 513 return...
As more articles are gathered, analysed and verified by a human, it would be nice for the ML models to self update. Open to discussion on tools and best practices...
There's lots of unused imports and things like the notebooks could be better organised
Hello @simonb83 the documentation for the internal-displacement project is excellent. I wanted to make a suggestion to add on the fist page of the installation wiki: [wikipage](https://github.com/Data4Democracy/internal-displacement/wiki) Add a numbered...
Updated broken links to https://unite.un.org/ideas/content/idetect using Internet Archive links. Also added/cleaned links in a couple other places
Currently we just return the article if it is scraped successfully, but only the message "retrieval failed" if not. Would be good to add the HTTP status code.
Doesn't seem to happen very often, but have experienced a couple of timeouts while scraping (every few thousand articles). Will post the trace for the next one.
Sometimes no publication date is available and a blank string is returned. However the db model expects a date time. Possible fix in `scraper.Scraper.html_article`: ``` if not isinstance(a.publish_date, datetime.datetime): article_pub_date...
We would like to have the front end be able to submit new URLs to process by writing an article row into the DB with a status of NEW. We...
Trace ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/George/miniconda3/envs/d4d-internal-displacement/lib/python3.6/http/client.py in _read_status(self) 282 try: --> 283 status = int(status) 284 if status < 100 or status > 999: ValueError: invalid...