Tim Hobson

Results 13 issues of Tim Hobson

Bugs: - [x] Reported type precision & recall are identical across all patch types - [x] Column accuracy metric incorrectly calculated in cases of failed type recall - [ ]...

One approach to calibration is to choose penalty parameters based on the results of experiments on standard (UCI) datasets. If this approach is adopted (there are others), an additional dataset...

Following [Charles's notes](https://github.com/alan-turing-institute/aida-datadiff/blob/master/notes/datadiff-experiment-plan.md): - [x] Provide access to UCI datasets - [x] Add functions to randomly sample from the set of valid patch objects: - [x] permute - [x] shift...

Currently the ddiff algorithm works only under the assumption that columns are either inserted or deleted (or neither), but not both.

Existing unit tests cover inserts, inserts+permutes, deletes & deletes+permutes. A final "mixed" test is required which involves all three corruption types.

CS wrote on 17/08/2017: Performance on detecting shift and scale is not very good. This could be for at least three reasons: a) The synthetic problems are too hard. i.e....

Currently parameters are estimated by passing the diffness measure to the generic R optimisation function (stats::optimise). We could certainly make this more efficient for the specific case of the K-S...

The performance of ddiff ought not to depend on whether a column contains a character vector or a factor, but a discrepancy was observed when running on the UK broadband...