bugbug
bugbug copied to clipboard
Try using tries to store DictVectorizer and Count/TfidfVectorizer vocabularies
This should reduce memory usage. See also https://github.com/scikit-learn/scikit-learn/issues/2639.
we can use pygtrie library here.