datumaro
datumaro copied to clipboard
Test suite to cover typical scenarios on supported datasets
To improve testing of the library we need to have a table with experiments for supported datasets.
Experiments:
- Convert from the public dataset to
Datumaroformat and back to the original format. Metrics: status and correctness of the conversion procedure, time to download, read, write, convert the dataset. - Take a public dataset and merge it with itself in different modes. Modes: remove duplicates, keep all annotations, merge similar annotations into one. Metrics: status and correctness of the operation, time to merge.
- Extract a subset of a supported dataset, modify the subset, merge it back. Metrics: status and correctness of the operation, time to extract the subset, merge.
- Take a public dataset, modify it, navigate by its history. Metrics: status and correctness of the operation, time to move backward and forward by the history, estimate disk space which is required.
- etc
It will be grade to run the test suite once a week and public results every release. There several benefits here:
- Stress testing on real dataset and scenarios
- Performance testing
- Improve stability and correctness of the library