snacktory icon indicating copy to clipboard operation
snacktory copied to clipboard

Detect publish date

Open bejean opened this issue 12 years ago • 3 comments

A great feature could be to detect the published date of the web page. This information is often located somewhere at the top or the bottom of the main text.

bejean avatar May 18 '12 22:05 bejean

Any ideas of 'how'?

Or even better some code :) ?

karussell avatar May 19 '12 11:05 karussell

BTW: at the moment the date is guessed from the URL only

karussell avatar Jul 12 '12 08:07 karussell

Hi, I tested this and it is a good first step. I didn't really think about doing this. May be create an array of regexp and apply it in the extracted text.

Anyway, today, it is not possible to get the date directly with a ArticleTextExtractor object, the only way is to use SHelper class

ArticleTextExtractor extractor = new ArticleTextExtractor(); JResult res = extractor.extractContent(rawData); text = res.getText(); title = res.getTitle(); date = SHelper.completeDate(SHelper.estimateDate(url));

bejean avatar Sep 23 '12 06:09 bejean