newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

Sentences not tokenized properly.

Open AndyTheFactory opened this issue 2 years ago • 0 comments

Issue by deepshah Thu May 26 09:47:05 2016 Originally opened as https://github.com/codelucas/newspaper/issues/256


eg: http://timesofindia.indiatimes.com/tech/mobiles/Worlds-cheapest-smartphone-Namotel-Acche-Din-launched-at-Rs-99/articleshow/52323726.cms

For the above article some sentences do not get tokenized properly. These are sentences with
tags between them (see: However, the website did not open when TOI Tech tried to book the smartphone.Also, there's a rider for all those planning to buy the smartphone.)

I am not sure if this is the exact cause but removing Read this story in Marathi solves the issue.

AndyTheFactory avatar Oct 24 '23 07:10 AndyTheFactory