internal-displacement
internal-displacement copied to clipboard
Scraper - Tag content type
During scraping, can we tag whether something is text/video/image/pdf. Extra dessert if you can discern between news/blog etc.
For video / image, if there is no accompanying text on the page, we are likely to end up tagging the link as not relevant as the idea is to base this upon whether or not it talks about the reporting terms.
So in both these cases I feel that these pages are likely to be mixed content?
You guys ever considered Alchemy Data News as a source? You can have it return rss feeds from sites if present as well.
Thanks @Cl34r. Checking that out.
Latest approach in classification
notebook