yake icon indicating copy to clipboard operation
yake copied to clipboard

Default Keyword Extraction Includes Phone Numbers

Open tleedepriest opened this issue 2 years ago • 0 comments

After reading the paper, I thought that numbers would be discarded when extracting keywords, however when running the keyword extractor on the 20newsgroup dataset, specifically the document

20news_home/20news-bydate-train/misc.forsale/75935.txt

The following keywords are extracted (I removed the scores and joined the keyword phrases with underscores)

excellent_condition tom accord excellent honda offer_call offer model honda_accord miles 795-5636 653-0638 highway accord_for_sale sale loaded white offer_call_tom highway_miles condition

There are other instances of numbers with hyphens extracted as well.

tleedepriest avatar Jan 28 '22 16:01 tleedepriest