FinanceDatabase icon indicating copy to clipboard operation
FinanceDatabase copied to clipboard

add product names

Open mrx23dot opened this issue 4 years ago • 2 comments

Here is an example where we extract products from summary field. Uses NLP to detect nouns, matches nouns to a product database. Requires some work to build up the DB, I added most common 3K items.

Company description can be scraped from different places to have wide range of products. After we have products we can easily find precise competitors.

Extracted products for MSFT ['console', 'server', 'software', 'tablet', 'window', 'windows', 'xbox'] Post filtering can be added.

If anyone has time please pick this up. Install instruction on top. Would make sense to start with SP500, nasdaq, then rest of the symbols. products.zip

mrx23dot avatar Feb 16 '21 17:02 mrx23dot

Could you elaborate where you got the 'common 3K items'? I wish to include a key within the JSON files for 'keywords' like you mention from MSFT.

JerBouma avatar May 18 '21 16:05 JerBouma

I used google with as many product keywords as possible and took the first result, with minimal normalisation. I extracted everything from there, thus I didn't write it up. Keeping the keywords to minimum would be better for classification.

I would go the other way around and get multiple descriptions for one company, thus increasing the chance of product mentioning. (also less labor intense)

mrx23dot avatar May 18 '21 19:05 mrx23dot