pyahocorasick icon indicating copy to clipboard operation
pyahocorasick copied to clipboard

Support typed array as an input when storing sequences

Open pombredanne opened this issue 5 years ago • 0 comments

The STORE_SEQUENCE feature introduced with #27 is great but it works only from tuples as opposed to more general sequences of integers. In particular support array.array types would be great. Arrays store integers in a much more compact way than tuples.

>>> from pympler.asizeof import asizeof as s
>>> from array import array
>>> t=tuple(xrange(15000))
>>> a=array('h', xrange(15000))
>>> s(a)
31024
>>> s(t)
480056

This is because the data structure is limited to a single type and eventually you are able to store short, long, floats in the most appropriate fixed size type. So this request is for an enhancement to allow these two things:

  1. using array as a sequence type
  2. honor the the type of the array, e.g. store shorts/long/double exactly and not something else (such as 32 bits on Py 3 or 16 bits on Py 2 as it is now).

Note that the two could be implemented separately somehow: you could specify the integer sequence type at construction time for instance and use that for any sequence and accept various sequence types when adding "words". Or we could just add support for typed array and get the int type from the array instead too.

pombredanne avatar Jun 04 '19 08:06 pombredanne