Annif icon indicating copy to clipboard operation
Annif copied to clipboard

Store metadata of project training

Open juhoinkinen opened this issue 6 years ago • 4 comments

It could be valuable to store some info about training of a project, which could also be shown by the CLI commands and REST API /projects/{project_id} method.

Currently show-project outputs the following:

Project ID:        tfidf-fi
Project Name:      TF-IDF Finnish
Language:          fi
Vocabulary:        yso
Vocab language:    fi
Access:            public
Trained:           True
Modification time: 2023-04-21 10:33:16

Maybe some of the following data could be added:

  • [x] training state: trained or not-trained
  • [x] vocabulary state: loaded or not-loaded
  • [x] timestamp of the training run
  • [ ] version of Annif that has been used in training/learning the project
  • [ ] number of documents used to train the project
  • [ ] name(s) of the file(s) via which the documents was/were given when training the project
  • [ ] used backend parameters
  • [ ] backend name
  • [ ] duration of the training run
  • [ ] ~size of the model on disk~
  • [ ] timestamp(s) of learning run(s)

Also some more details on the training data (what?) and something that now goes to the debug log (what?).

These data for could be stored e.g. in a metadata file data/projects/the_project/annif_metadata along with the model file(s).

Edit: converted bullet list to checkbox list.

juhoinkinen avatar Sep 18 '19 12:09 juhoinkinen

These are all good ideas. Also it could be useful to have a "failed" status for projects/backends where the initialization fails for some reason - but that means making the initialization more careful.

I think it makes sense to start small and implement status features one (or two) at a time in separate PRs. For example trained vs. not-trained would be a good start.

Marking the issue as Long term because there are so many ideas here, but that doesn't mean we could implement some of them very soon.

osma avatar Sep 30 '19 12:09 osma

Tagging with 0.48 to indicate that some work in this direction (but not everything) should be done in that release

osma avatar May 12 '20 11:05 osma