newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

regex issue while parsing date from the url

Open AndyTheFactory opened this issue 2 years ago • 0 comments

Issue by vashis Thu May 17 09:49:16 2018 Originally opened as https://github.com/codelucas/newspaper/issues/566


Ex: https://www.sciencedaily.com/releases**/2018/05/180515105704**.htm is fetching date from url as 2018/05/18 which is not correct, by making below changes, we can restrict that.

STRICT_DATE_REGEX = '(?<=\W)([\./\-]{0,1}(19|20)\d{2})[\./\-]{0,1}(([0-3]{0,1}[0-9][\./\-])|(\w{3,5}[\./\-]))([0-3]{0,1}[0-9][\./\-]{1})?'

better to use {1} in the end instead of {0,1}

AndyTheFactory avatar Oct 24 '23 12:10 AndyTheFactory