Sean Massung
Sean Massung
This is the first issue. I've labeled it as a comment by selecting the "Labels" button on the right side of this text box. Issues are cool because you can...
Re: discussion in #150 -- right now feature selection is classification-centric.
Create a filter that replaces all numbers with the same token. This is not an alpha filter; we want to represent that numbers occurred, while collapsing them all into one...
Parse a list of TREC files with multiple tags per file, etc. Could support .gz TREC files.
`language_model` needs the ability to estimate from a corpus instead of requiring a .arpa file
Use the output of a chunker to create features based on strings of words. This will be particularly useful when combined with the topic modeling algorithms to create phrase-based models.