blist
blist copied to clipboard
Segmentation Fault
Trying to use a blist of size 4**20 and I get segmentation fault upon first read/write to blist. Using latest version from git. Running on Mac OSX on x86_64 with latest Xcode/gcc install. Commands to replicate error:
[abhik:~/Downloads]$ git clone https://github.com/DanielStutzbach/blist.git
Cloning into blist...
remote: Counting objects: 1169, done.
remote: Compressing objects: 100% (523/523), done.
remote: Total 1169 (delta 673), reused 1111 (delta 639)
Receiving objects: 100% (1169/1169), 400.80 KiB | 502 KiB/s, done.
Resolving deltas: 100% (673/673), done.
[abhik:~/Downloads/blist]$ python setup.py build_ext --inplace
Downloading http://pypi.python.org/packages/source/d/distribute/distribute-0.6.12.tar.gz
Extracting in /var/folders/Tv/Tv70U9x4EJ8RVIsTgph+fE+++TI/-Tmp-/tmprwLwFu
Now working in /var/folders/Tv/Tv70U9x4EJ8RVIsTgph+fE+++TI/-Tmp-/tmprwLwFu/distribute-0.6.12
Building a Distribute egg in /Users/abhik/Downloads/blist
/Users/abhik/Downloads/blist/distribute-0.6.12-py2.7.egg
running build_ext
building '_blist' extension
creating build
creating build/temp.macosx-10.6-x86_64-2.7
/usr/bin/gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -pipe -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DBLIST_FLOAT_RADIX_SORT=1 -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _blist.c -o build/temp.macosx-10.6-x86_64-2.7/_blist.o
creating build/lib.macosx-10.6-x86_64-2.7
/usr/bin/gcc-4.2 -L/opt/local/lib -bundle -undefined dynamic_lookup -L/opt/local/lib build/temp.macosx-10.6-x86_64-2.7/_blist.o -o build/lib.macosx-10.6-x86_64-2.7/_blist.so
copying build/lib.macosx-10.6-x86_64-2.7/_blist.so ->
[abhik:~/Downloads/blist]$ python
Python 2.7.1 (r271:86832, Mar 8 2011, 16:24:29)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import blist x = blist.blist([0]) * 4**20 x[10] Segmentation fault
Thanks!
Ouch, that's pretty nasty. I'm taking a look now.
Okay, I see the problem. Although blist is very memory efficient for sparse lists, it still uses O(n) memory. Some of the code assumes the list won't be larger than approximately 2**32, leading to a crash.
At minimum, it should raise a MemoryError or something like that rather than causing a segmentation fault.
In theory, blist could be altered to give up O(1) lookups in exchange for using only O(log n) memory for sparse lists. However, the implementation is non-trivial (the hard part is tracking when a list is "sparse").
Do you have a use case for making a list with 4**20 elements or were you (understandably) just playing around? :-)
Ah, I didn't realize the O(n) memory requirement. I think I misunderstood the 'sparse' in blist's description. I thought it worked somewhat like a hash and that only indexes where I set a value would be actually stored.
I do actually have a use case. I need to store (and lookup) integer values corresponding to 3 billion 20-character long DNA sequences. The python dict was too slow and memory intensive so I tried blist. I wanted to convert each DNA sequence to an integer between 0 and 4**20 (4 because alphabet is [A,C,G,T]) and then set its index in the blist to the corresponding integer value. Looks like time to pull out the rusty gcc.. :)
I have a similar use case, actually. The example provided by abhik still causes a segfault. By any chance, has the situation changed in the last several years to make this problem simpler to solve?