Feature request: add project specs and backend info to REST API
This is a follow-on from #912 -- thank you for that!
Currently, the only way to get some project information (analyzer_spec, vocab_spec, transform_spec, backend.params and backend.DEFAULT_PARAMETERS) is directly from an AnnifProject object. In cannif, I import the annif module directly to read them, and I would prefer to not have to.
It would be very helpful to have these exposed within the REST interface instead. I'm mindful of overloading the /projects response object, and the fact there are backend and vocab dicts already. However, at a minimum the *_spec fields would go a long way, and backend_params gives enough information to recreate or update a project.
I would even support a /backends endpoint for the information provided in backend.DEFAULT_PARAMETERS, especially if it provides a list / dict of configured / installed backends eg. somehow identify or omit those without external dependencies installed.
Also to add, I am happy to submit this code through a PR if you are in agreement on the idea. Thanks.
Thank you for the suggestion and your offer to implement it. We are definitely interested in expanding the REST API functionality this way. Having cannif as a concrete use case for this is very helpful!
Can you provide an example of how the response objects could look?
Fantastic! Currently I pull project information from the analyzer_spec, vocab_spec, transform_spec attributes, and backend information from default_params(), params: https://github.com/mjsuhonos/Annif-corpora/blob/master/cannif/cannif_streamlit.py#L67-L71
The resulting project dict looks like this:
{
"project_id": "my-project",
"name": "My Project",
"language": "en",
"backend": {
"backend_id": "my-backend",
"default_params": {
"limit": 100,
"max_ngram_size": 4,
"deduplication_threshold": 0.9,
"deduplication_algo": "levs",
"window_size": 1,
"num_keywords": 100,
"features": null,
"label_types": [
"prefLabel",
"altLabel"
],
"remove_parentheses": false
},
"backend_params": {
"limit": 200,
"max_ngram_size": 2,
"deduplication_threshold": 0.9,
"deduplication_algo": "levs",
"window_size": 1,
"num_keywords": 200,
"features": null,
"label_types": [
"prefLabel",
"altLabel"
],
"remove_parentheses": false,
"name": "My Project",
"language": "en",
"backend": "my-backend",
"vocab": "my-project(en)",
"analyzer": "snowball(english)",
"transform": "limit(20000)"
}
},
"analyzer_spec": "snowball(english)",
"vocab_spec": "my-project(en)",
"transform_spec": "limit(20000)",
"vocab": {
"vocab_id": "yso",
"languages": [
"en"
],
"size": 1000,
"loaded": true
},
"vocab_language": "en",
"is_trained": true,
"modification_time": "2025-03-13T19:41:32.592630+00:00"
}
That is a much bigger response! However, there is a lot of redundancy in backend_params, and I would propose:
- keeping
analyzer_spec,vocab_spec,transform_specvalues on the project - filtering
backend_paramsto only include parameter values which have been modified from the defaults
This would result in a response like:
{
"project_id": "my-project",
"name": "My Project",
"language": "en",
"backend": {
"backend_id": "my-backend",
"default_params": {
"limit": 100,
"max_ngram_size": 4,
"deduplication_threshold": 0.9,
"deduplication_algo": "levs",
"window_size": 1,
"num_keywords": 100,
"features": null,
"label_types": [
"prefLabel",
"altLabel"
],
"remove_parentheses": false
},
"backend_params": {
"limit": 200,
"max_ngram_size": 2,
"num_keywords": 200,
}
},
"analyzer_spec": "snowball(english)",
"vocab_spec": "my-project(en)",
"transform_spec": "limit(20000)",
"vocab": {
"vocab_id": "yso",
"languages": [
"en"
],
"size": 1000,
"loaded": true
},
"vocab_language": "en",
"is_trained": true,
"modification_time": "2025-03-13T19:41:32.592630+00:00"
}
Still bigger, but it now provides enough information to fully populate a form for editing (or creating) projects, with default values.