Dan Luu comments

Results 72 comments of


Dan Luu

DiagnosticStream::IsEnabled curiously expensive?

I hacked out diagnosticStream and got a ~8% speedup across multiple runs. diagnosticStream is used in both QueryPlanner and ByteCodeInterpreter and it appears to be used inside Bing QueryPlanner and...

querylog only produces terms from head

Whoops, I was looking in the wrong file. I'll re-run without the extra constant to drop the tail because it appears that we're already dropping the tail.

querylog only produces terms from head

Run with the current setup (no chopping off the last 3/4, looking at a wikipedia shard with documents of lengths 100 to 150), there are 2.1 million terms. No term...

querylog only produces terms from head

There's some code which scales the values relative to the largest value seen: ~~~ // Grab the next random number double value = m_values[m_nextValue % m_values.size()]; m_nextValue++; // Convert random...

querylog only produces terms from head

In the case where we generate >= 10m terms, the two issues are: 1. The scaling factor indexes off the end of the document frequency table (`value *= m_dft.size()` should...

Relatively large amount of time spent in SimplePlanner destructor

I believe we do the same amount of work in the QueryPlanner destructor, so switching over won't solve the problem, but it would mean that solving the problem will fix...

Increase in false positives when substituting rank0 rows for rank3 rows

Here are some histograms which confirm that we do have more correlations when moving from the mixed rank3/rank0 treatment to the rank0 only treatment: The first graph is a graph...

Increase in false positives when substituting rank0 rows for rank3 rows

If we don't exclude correlations of `1`, the result looks a lot clearer, although the intermediate files become relatively large Here are versions of the above graphs that are as...

Increase in false positives when substituting rank0 rows for rank3 rows

If we look at the TermTable build, we can see that the Rank0 treatment only has twice as many rank0 rows as the Rank3/Rank0 treatment even though we expect it...

What are the consequences of a Term::Hash colliding with a Fact?

Can this actually happen? It seems like it, from `GetTermInfo` ``` if (term.GetRawHash() < m_factsCount) { // Facts are private rank 0 rows with the special row offsets. termKind =...