linfa icon indicating copy to clipboard operation
linfa copied to clipboard

Ideas for testing

Open bytesnake opened this issue 5 years ago • 5 comments

This issue suggests some ideas, which you might use to improve your testing. Normally writing tests can be a very time-consuming task and it is crucial to have a large number of tests for good coverage.

  • check failure for invalid hyper-parameters
  • check the construction of a model with a well-known dataset
  • check that linearly separable data has an optimal accuracy
  • check special qualities of your algorithm, e.g. can detect/is robust against outliers, construct sparse solutions etc.
  • use a real-world dataset and compare to performance with similar implementations
  • look into scikit-learn and how they are performing the testing

If you have any specific test idea for any algorithm in the linfa ecosystem, please add a comment below :tada:

bytesnake avatar Dec 04 '20 16:12 bytesnake

Would it be ok to write tests referencing the datasets in the dataset folder (like iris) to try and replicate scikit-learn's tests?

Sauro98 avatar Dec 15 '20 20:12 Sauro98

good objection, I have create a PR https://github.com/rust-ml/linfa/pull/72, which introduces linfa-datasets for this purpose

bytesnake avatar Dec 16 '20 13:12 bytesnake

there is now a small section at the end of the CONTRIBUTE file explaining how to use linfa-datasets

bytesnake avatar Dec 16 '20 16:12 bytesnake

Thank you!

Sauro98 avatar Dec 16 '20 16:12 Sauro98

Can we have benchmarks that measure the algorithms' accuracy values rather than performance? For example, for clustering algorithms we can measure the sum of squared distances from the nearest centroid as a metric for accuracy.

YuhanLiin avatar Mar 24 '21 16:03 YuhanLiin