yake
yake copied to clipboard
Default Keyword Extraction Includes Phone Numbers
After reading the paper, I thought that numbers would be discarded when extracting keywords, however when running the keyword extractor on the 20newsgroup dataset, specifically the document
20news_home/20news-bydate-train/misc.forsale/75935.txt
The following keywords are extracted (I removed the scores and joined the keyword phrases with underscores)
excellent_condition tom accord excellent honda offer_call offer model honda_accord miles 795-5636 653-0638 highway accord_for_sale sale loaded white offer_call_tom highway_miles condition
There are other instances of numbers with hyphens extracted as well.