Train a Project using API
It would be great to have a REST method were you could upload a compressed archive and then a model is trained. You probably would have to add the project configuration to the call. An alternative could be creating new projects via API and then train them afterwards. To stop disk space from filling up, the intermediate files should be removed after training or there has to be an API call for cleaning intermediate data. The later variant would also allow parameter optimization using the uploaded data.
You would probably need some kind of locking for the projects file to keep it in sync with the projects of the running instance
This sounds like a good feature and something I've also thought about in the past. However, it's quite a challenge to do all of this and also ensure data consistency. This may be worthwhile to split into several features implemented separately:
- create a project (with configuration) via the REST API
- alter the configuration of an existing project via the REST API
- train the project via the REST API
- add support for these operations in the Web UI
There is already a tiny bit of functionality in this direction - the learn method in the REST API. Perhaps some of that code could be used for inspiration.
For the API design, I think it might be worth looking at the Maui Server HTTP API which has similar goals (they have taggers, Annif has projects).
There could be also CLI command to create a new project configuration, named e.g. create-project or new-project.
Most backend-specific parameters could be set to default values coming from projects.cfg.dist and written to the configuration, if they cannot be omitted altogether (i.e. having default values hardcoded in the backend). If a backend has some necessary parameters that cannot be defaulted, they could be given with the existing --backend-param option (after some tweaking of it). Usage could be like
annif create-project yso-fi --backend nn_ensemble --language fi --vocab yso \
--backend-param sources=yso-mllm-fi,yso-fasttext-fi
This could speed up creating new projects, for which currently copy-pasting an example project configuration from Wiki or projects.cfg.dist is often used.
There could be also CLI command to create a new project configuration, named e.g.
create-projectornew-project.Most backend-specific parameters could be set to default values coming from
projects.cfg.distand written to the configuration, if they cannot be omitted altogether (i.e. having default values hardcoded in the backend). If a backend has some necessary parameters that cannot be defaulted, they could be given with the existing--backend-paramoption (after some tweaking of it). Usage could be likeannif create-project yso-fi --backend nn_ensemble --language fi --vocab yso \ --backend-param sources=yso-mllm-fi,yso-fasttext-fiThis could speed up creating new projects, for which currently copy-pasting an example project configuration from Wiki or
projects.cfg.distis often used.
For creating the project configuration via CLI, instead of giving backend parameters with options, a better approach is to inquire them with prompts. There could be even autocompletion showing selectable values, e.g. for backend.