python-crfsuite icon indicating copy to clipboard operation
python-crfsuite copied to clipboard

Generating an ItemSequence with tuples instead of dictionaries

Open uwaisiqbal opened this issue 6 years ago • 0 comments

Would it be possible to extend the functionality of ItemSequence to allow the use of tuples. Python dictionaries have an incredible memory overhead and when using an extensive feature set of thousands of features the feature dictionaries alone can take up a lot of memory. Python dictionaries inherently are memory costly so that they have fast look up times. After looking through your code you are just using dictionaries as a store for the key/value information for features which are eventually converted into ItemSequence objects. A more optimal data structure in terms of memory would be a tuple with the first element as the feature name and the second as the feature value.

Let me know your thoughts. I think it would be a big optimisation to shift from dictionaries towards tuples for large feature sets.

uwaisiqbal avatar Nov 20 '17 10:11 uwaisiqbal