sklearn-crfsuite
                                
                                 sklearn-crfsuite copied to clipboard
                                
                                    sklearn-crfsuite copied to clipboard
                            
                            
                            
                        API Compatibility with Numpy Arrays and Scipy Matricies for features
At the moment the library only accepts a list of feature dictionaries which for our purposes can consume an enormous amount of memory even when using generators. Would it be possible to extend the API to accept numpy arrays or scipy sparse matricies generated from the sklearn DictVectorizer?
@oasis789 crfsuite implements vectorization itself, that's why dicts are currently exposed. I wonder why do you prefer DictVectorizer - sklearn-crfsuite data format is largely compatible, with a few extra features usable for sequential models.
It could be possible to implement what you're suggesting usin crfsuite C API (https://github.com/jakevdp/pyCRFsuite did that), but it requires wor.
See also: https://github.com/scrapinghub/python-crfsuite/pull/38
I wanted to put together a pipeline for feature generation that would include the crf model making use of sklearn feature unions. The feature unions concatenate the output of transformations in the form of spares matrices. I wanted to be able to feed this directly to the crf model within the pipeline.
hi @kmike are floats used as features in dictionaries taken as they are or do they suffer any transformation? I'm asking because I'm concerned with data sparcity, for example if I encode my feature in a [-1, 1] range I wouldn't like the vectorizer to create a single feature for each single possible value.