Dan Luu

Results 101 issues of Dan Luu

I ran ~~~ tools/BitFunnel/src/BitFunnel querylog ~/dev/wikipedia.100.200/config/ 100000000 3 1 ~~~ This appears to have produced a log with `99999996` terms: ~~~ $ wc QueryLog.txt 99999996 301335822 2231543773 QueryLog.txt ~~~

Verify took log against the document frequency table took about 40 minutes for me on the `chunked` corpus of 14k documents. If we run against a small shard of 1m...

~~~ Query,TermPos,TruePositives,FalsePositives,FalseNegatives,FalseRate from,,0,2838699,0,1 also,1,0,2018632,0,1 which,2,0,1961024,0,1 has,3,0,1776560,0,1 first,4,0,1705426,0,1 ~~~

This happens when I build a TermTable using the `Experimental` treatment and then run analyze: ~~~ 1: cd /tmp output directory is now "/tmp". 2: analyze Thread 1 "BitFunnel" received...

~~~ $ tools/BitFunnel/src/BitFunnel repl /tmp/wikipedia.100.200/config/ -script /tmp/script.nothing Welcome to BitFunnel! Starting 1 thread (plus one extra thread for the Recycler.) directory = "/tmp/wikipedia.100.200/config/" gram size = 1 Starting index ......

When we run filter with two "reasonable" document lengths (say, 100 and 200), we get "a lot" of small files and some empty files. This shouldn't matter once we have...

Some files have column names that start with capital letters and some have column names that start with lowercase letters.

~~~ int foo(void) __attribute__((pure)); int bar(void) { int acc = 0; for (int i = 0; i < 100; ++i) acc += foo(); return acc; } int foo2(void); int bar2(void)...

IIRC, when we tried this, the issue was that the clang that you can trivially get on Mac (`brew install llvm38`) didn't have proper library support. The discussion was a...