Annif icon indicating copy to clipboard operation
Annif copied to clipboard

loadvoc command should take a vocabulary id, not project id

Open osma opened this issue 3 years ago • 0 comments

The annif loadvoc command currently takes a project ID as a parameter, like this:

annif loadvoc yso-tfidf-fi path/to/yso.ttl

But this can be a bit misleading, because the same vocabulary could be used by many other projects and thus their vocabulary will be loaded/updated as well. The problem will be even more prominent after implementing #559 / #600 , which makes vocabularies multilingual, and thus even projects in different languages may share the same vocabulary (and vocabulary id).

I suggest changing this so that the loadvoc command instead takes a vocabulary ID, like this:

annif loadvoc yso path/to/yso.ttl

This would align better with current reality, but of course it's a potentially disruptive change, since for example scripts that perform loadvoc operations have to be modified and all the relevant documentation updated, including the Annif tutorial. There could perhaps be a transition period where loadvoc with a project id keeps working but prints a deprecation warning...

There's also the question of how to deal with languages, especially when loading a vocabulary from a TSV file. Currently the language of a TSV vocabulary is inferred from the project configuration. But if the vocabulary is loaded directly, there is no project configuration, so the language may need to be specified directly, for example with a --language option (shortened to -L which is not otherwise used in current CLI commands). This would also be useful for SKOS vocabularies that lack language tags, as discussed in #556.

osma avatar Aug 04 '22 13:08 osma