internal-displacement icon indicating copy to clipboard operation
internal-displacement copied to clipboard

Scraper - Tag content type

Open georgerichardson opened this issue 8 years ago • 4 comments

During scraping, can we tag whether something is text/video/image/pdf. Extra dessert if you can discern between news/blog etc.

georgerichardson avatar Feb 05 '17 03:02 georgerichardson

For video / image, if there is no accompanying text on the page, we are likely to end up tagging the link as not relevant as the idea is to base this upon whether or not it talks about the reporting terms.

So in both these cases I feel that these pages are likely to be mixed content?

simonb83 avatar Feb 07 '17 22:02 simonb83

You guys ever considered Alchemy Data News as a source? You can have it return rss feeds from sites if present as well.

ghost avatar Feb 17 '17 00:02 ghost

Thanks @Cl34r. Checking that out.

georgerichardson avatar Feb 19 '17 03:02 georgerichardson

Latest approach in classification notebook

georgerichardson avatar May 04 '17 18:05 georgerichardson