Dan Luu

Results 72 comments of Dan Luu

I hacked out diagnosticStream and got a ~8% speedup across multiple runs. diagnosticStream is used in both QueryPlanner and ByteCodeInterpreter and it appears to be used inside Bing QueryPlanner and...

Whoops, I was looking in the wrong file. I'll re-run without the extra constant to drop the tail because it appears that we're already dropping the tail.

Run with the current setup (no chopping off the last 3/4, looking at a wikipedia shard with documents of lengths 100 to 150), there are 2.1 million terms. No term...

There's some code which scales the values relative to the largest value seen: ~~~ // Grab the next random number double value = m_values[m_nextValue % m_values.size()]; m_nextValue++; // Convert random...

In the case where we generate >= 10m terms, the two issues are: 1. The scaling factor indexes off the end of the document frequency table (`value *= m_dft.size()` should...

I believe we do the same amount of work in the QueryPlanner destructor, so switching over won't solve the problem, but it would mean that solving the problem will fix...

Here are some histograms which confirm that we do have more correlations when moving from the mixed rank3/rank0 treatment to the rank0 only treatment: The first graph is a graph...

If we don't exclude correlations of `1`, the result looks a lot clearer, although the intermediate files become relatively large Here are versions of the above graphs that are as...

If we look at the TermTable build, we can see that the Rank0 treatment only has twice as many rank0 rows as the Rank3/Rank0 treatment even though we expect it...

Can this actually happen? It seems like it, from `GetTermInfo` ``` if (term.GetRawHash() < m_factsCount) { // Facts are private rank 0 rows with the special row offsets. termKind =...