sumgram Fix `InvalidParameterError` on `CountVectorizer`

Fix `InvalidParameterError` on `CountVectorizer`

Open jefromyers opened this issue 2 years ago • 0 comments

This looks like a really interesting project! I was trying to play around with it, first by using the examples in the README.md and kept running into an InvalidParameterError error.

The Example I was trying:

import json
from sumgram.sumgram import get_top_sumgrams

doc_lst = [
    {'id': 0, 'text': 'The eye of Category 4 Hurricane Harvey is now over Aransas Bay. A station at Aransas Pass run by the Texas Coastal Observing Network recently reported a sustained wind of 102 mph with a gust to 132 mph. A station at Aransas Wildlife Refuge run by the Texas Coastal Observing Network recently reported a sustained wind of 75 mph with a gust to 99 mph. A station at Rockport reported a pressure of 945 mb on the western side of the eye.'},
    {'id': 1, 'text': 'Eye of Category 4 Hurricane Harvey is almost onshore. A station at Aransas Pass run by the Texas Coastal Observing Network recently reported a sustained wind of 102 mph with a gust to 120 mph.'},
    {'id': 2, 'text': 'Hurricane Harvey has become a Category 4 storm with maximum sustained winds of 130 mph. Sustained hurricane-force winds are spreading onto the middle Texas coast.'}
  ]

'''
  Use 'add_stopwords' to include list of additional stopwords not included in stopwords list (https://github.com/oduwsdl/sumgram/blob/0224fc9d54034a25e296dd1c43c09c76244fc3c2/sumgram/util.py#L31)
'''
params = {
    'top_sumgram_count': 10,
    'add_stopwords': ['image'],
    'no_rank_sentences': True,
    'title': 'Top sumgrams for Hurricane Harvey text collection'
}

ngram = 2
sumgrams = get_top_sumgrams(doc_lst, ngram, params=params)
with open('sumgrams.json', 'w') as outfile:
  json.dump(sumgrams, outfile, indent=2)

I think the CountVectorizer requires a string, list or None and you were supplying a set. I just cast it to a list. Not sure if this is an a real issue (didn't see it in any current Issues) or something I messed up on my part but I thought I'd submit it incase it could help.

Feb 17 '23 04:02 jefromyers

sumgram sumgram copied to clipboard

Fix `InvalidParameterError` on `CountVectorizer`

sumgram
sumgram copied to clipboard