scan
scan copied to clipboard
questions from vectorizer.py
It's a good job.
When I read the code, I find some questions as follows:
(1) self.initial_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df = 3 / len(input_text), max_df=.4) need to change 3 to 3., or min_df = 0
(2) For every term, you compute fisher_exact, and choose the terms with the higher pval. maybe you should choose the lower pval.
Sorry for taking so long to reply! Must have missed this issue.
You're right on both counts -- thanks for noticing. I'll fix soon.