Chris de Vries

Results 26 issues of Chris de Vries

Currently the CMake build is specific for GCC. To make it cross platform it needs to use the find_package() function instead of manually setting GCC compile and link flags.

tools

Sparsifying storage of bitmap lookups can make them more efficient. See https://github.com/cmdevries/LMW-tree/blob/master/src/lmw/BitMapList8.h#L53 and https://github.com/cmdevries/LMW-tree/blob/master/src/lmw/BitMapList16.h#L55. Also can make it a single class while we are at it.

efficiency

Use compiler intrinsics to try different SIMD implementations of Hamming distance for both exlusive or and population count

efficiency

After indexing compress document vectors so they can be written out in a compressed binary format. Delta encode and variable byte sparse vectors. Use https://github.com/lemire/FastPFOR.

new feature
efficiency

Compress transmission of integer accumulators between machines vectors using https://github.com/lemire/FastPFOR. Hadoop + HDFS (just get hadoop to hand over the bytes, or use HDFS directly). ZeroMQ + GlusterFS. Apache Spark...

distributed

Use Lance's idea for cyclic generation of random index vectors. Very cache friendly.

new feature

It would be ideal for the indexer to output integer valued document vectors with term frequencies. These can be optionally written to disk in a compressed format using https://github.com/lemire/FastPFOR to...

new feature

Use memory pools, custom allocators, and, allocation of nearby vectors in contiguous memory to reduce allocation overhead and improve locality of reference.

efficiency
refactoring

TBB flow graph would make this a very flexible set of producers and consumers.

new feature