Adam Derewecki issues

Results 15 issues of


                                            Adam Derewecki

Standardize performance benchmark

As we're separating algorithmic changes from performance changes, there should be a standardized way to measure how two different versions perform from a purely performance point of view -- rate,...

Quality monitoring

As we make algorithmic changes, we should make sure that we're not degrading the quality of the recommendations.

Calculate correct best suggestions instead of heuristic

The current scoring mechanism has a percentage threshold that must be exceeded by (intersections / set_a_members_length) for the result to be scored. This is really not fair to all sets...

Parallelize across nodes

We can scale pretty easily on any work node (like EC2), so if we can prepare work units and distribute them across machines, we ought to be able to crank...

Anonymized real world data set

The test data is great, but makes it hard to test performance at the scale Suggestomatic was intended for. Internally at Causes we have a test set of about 900m...

Faster set intersection

There's a few algorithms that are supposed to be faster than O(m+n) time for a set intersection. Most of this are in academic papers that take 10 pages to describe...

Unit tests for C engine

Related to ccf4ff3bd207c6f14486578dd95a2a8392f9af3d, there's no sanity check for some of the more basic things the engine is doing. There should be a full test suite instead of just a smoke...

Unit tests for data preparation

61e59f891f17485c528c19b6736e4d24b8c5aa53 pointed to how fragile the data preparation step really is -- a unittest suite that performs sanity checks on the data is going to be crucial to the continual...

Adam Derewecki