Annif icon indicating copy to clipboard operation
Annif copied to clipboard

Feature request: add project specs and backend info to REST API

Open mjsuhonos opened this issue 2 months ago • 3 comments

This is a follow-on from #912 -- thank you for that!

Currently, the only way to get some project information (analyzer_spec, vocab_spec, transform_spec, backend.params and backend.DEFAULT_PARAMETERS) is directly from an AnnifProject object. In cannif, I import the annif module directly to read them, and I would prefer to not have to.

It would be very helpful to have these exposed within the REST interface instead. I'm mindful of overloading the /projects response object, and the fact there are backend and vocab dicts already. However, at a minimum the *_spec fields would go a long way, and backend_params gives enough information to recreate or update a project.

I would even support a /backends endpoint for the information provided in backend.DEFAULT_PARAMETERS, especially if it provides a list / dict of configured / installed backends eg. somehow identify or omit those without external dependencies installed.

mjsuhonos avatar Nov 13 '25 18:11 mjsuhonos

Also to add, I am happy to submit this code through a PR if you are in agreement on the idea. Thanks.

mjsuhonos avatar Nov 15 '25 17:11 mjsuhonos

Thank you for the suggestion and your offer to implement it. We are definitely interested in expanding the REST API functionality this way. Having cannif as a concrete use case for this is very helpful!

Can you provide an example of how the response objects could look?

osma avatar Nov 17 '25 07:11 osma

Fantastic! Currently I pull project information from the analyzer_spec, vocab_spec, transform_spec attributes, and backend information from default_params(), params: https://github.com/mjsuhonos/Annif-corpora/blob/master/cannif/cannif_streamlit.py#L67-L71

The resulting project dict looks like this:

{
	"project_id": "my-project",
	"name": "My Project",
	"language": "en",
	"backend": {
		"backend_id": "my-backend",
		"default_params": {
			"limit": 100,
			"max_ngram_size": 4,
			"deduplication_threshold": 0.9,
			"deduplication_algo": "levs",
			"window_size": 1,
			"num_keywords": 100,
			"features": null,
			"label_types": [
				"prefLabel",
				"altLabel"
			],
			"remove_parentheses": false
		},
		"backend_params": {
			"limit": 200,
			"max_ngram_size": 2,
			"deduplication_threshold": 0.9,
			"deduplication_algo": "levs",
			"window_size": 1,
			"num_keywords": 200,
			"features": null,
			"label_types": [
				"prefLabel",
				"altLabel"
			],
			"remove_parentheses": false,
			"name": "My Project",
			"language": "en",
			"backend": "my-backend",
			"vocab": "my-project(en)",
			"analyzer": "snowball(english)",
			"transform": "limit(20000)"
		}
	},
	"analyzer_spec": "snowball(english)",
	"vocab_spec": "my-project(en)",
	"transform_spec": "limit(20000)",
	"vocab": {
		"vocab_id": "yso",
		"languages": [
		"en"
		],
		"size": 1000,
		"loaded": true
	},
	"vocab_language": "en",
	"is_trained": true,
	"modification_time": "2025-03-13T19:41:32.592630+00:00"
}

That is a much bigger response! However, there is a lot of redundancy in backend_params, and I would propose:

  • keeping analyzer_spec, vocab_spec, transform_spec values on the project
  • filtering backend_params to only include parameter values which have been modified from the defaults

This would result in a response like:

{
	"project_id": "my-project",
	"name": "My Project",
	"language": "en",
	"backend": {
		"backend_id": "my-backend",
		"default_params": {
			"limit": 100,
			"max_ngram_size": 4,
			"deduplication_threshold": 0.9,
			"deduplication_algo": "levs",
			"window_size": 1,
			"num_keywords": 100,
			"features": null,
			"label_types": [
				"prefLabel",
				"altLabel"
			],
			"remove_parentheses": false
		},
		"backend_params": {
			"limit": 200,
			"max_ngram_size": 2,
			"num_keywords": 200,
		}
	},
	"analyzer_spec": "snowball(english)",
	"vocab_spec": "my-project(en)",
	"transform_spec": "limit(20000)",
	"vocab": {
		"vocab_id": "yso",
		"languages": [
		"en"
		],
		"size": 1000,
		"loaded": true
	},
	"vocab_language": "en",
	"is_trained": true,
	"modification_time": "2025-03-13T19:41:32.592630+00:00"
}

Still bigger, but it now provides enough information to fully populate a form for editing (or creating) projects, with default values.

mjsuhonos avatar Nov 20 '25 15:11 mjsuhonos