Annif-tutorial
Annif-tutorial copied to clipboard
Exercise about sufficient amount of train data (learning curves)
A common question in the tutorial sessions has been "how many documents do I need for training a model". We could have an optional exercise that would show how increasing --docs-limit value in training a model affects the evaluation results of the model. Also some simple way to plot the results as a learning curve would be nice.
As a first step I added an extra section to the MLLM exercise: https://github.com/NatLibFi/Annif-tutorial/blob/master/exercises/05_mllm_project.md#extra-experiment-with-different-amounts-of-training-data