Annif icon indicating copy to clipboard operation
Annif copied to clipboard

Automatically add metadata to Hugging Face Hub repos when uploading projects

Open juhoinkinen opened this issue 1 year ago • 6 comments

With this PR, when running annif upload:

  • if README.md (Model Card) does not exist in the destination repository, then README.md is created with default contents and some metadata of the uploaded projects,
  • if README.md exists, its metadata are updated as necessary.

Closes #790.

The metadata includes these:

language:
- <language-code tags automatically obtained from the uploaded projects>
tags:
- annif   # custom tag
pipeline_tag: text-classification  # HFH tag

The Model Card text content is very minimal; it has just the repo name as the heading and info about how to download projects from the repo, see an example in https://huggingface.co/juhoinkinen/Annif-models-upload-testing.

juhoinkinen avatar Jun 17 '24 14:06 juhoinkinen

About @osma's suggestions in https://github.com/NatLibFi/Annif/issues/790#issuecomment-2137376118:

For example it could include the Annif version used for training, the backend, vocabulary name and size, possibly some of the hyperparameters / configuration settings as well.

  • Annif version:
    • The Annif version used for training is not stored anywhere at the moment; the version performing the upload is not necessarily the same. This kind of metadata should be first stored somewhere, for which there is the issue https://github.com/NatLibFi/Annif/issues/329
  • Backend, vocabulary name and other project configuration:
    • These are available in the <project-id>.cfg files, accessible from the Files and versions tab, e.g. https://huggingface.co/NatLibFi/FintoAI-data-YSO/blob/main/yso-en.cfg, so I think they are not worth putting to the Model Card.

juhoinkinen avatar Jun 17 '24 14:06 juhoinkinen

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.65%. Comparing base (3b5f7a1) to head (e4febab). Report is 51 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
+ Coverage   99.64%   99.65%   +0.01%     
==========================================
  Files          91       93       +2     
  Lines        6817     7058     +241     
==========================================
+ Hits         6793     7034     +241     
  Misses         24       24              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 18 '24 09:06 codecov[bot]

@CodiumAI-Agent /review

juhoinkinen avatar Jun 18 '24 09:06 juhoinkinen

PR Reviewer Guide 🔍

(Review updated until commit https://github.com/NatLibFi/Annif/commit/845f53d74fee07c94b7f97be5dbd73550eb4ef58)

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Key issues to review

Error Handling
The function upsert_modelcard lacks error handling for potential failures during the push_to_hub operation. Consider adding try-except blocks to handle exceptions that might arise during the push operation, ensuring that the function can gracefully handle errors and provide meaningful feedback to the user.

Configuration Error Handling
The error handling in _read_config might not provide clear feedback to the user since it directly raises ConfigurationException with err.message, which might not be defined. It's recommended to ensure that the exception message is informative and user-friendly.

QodoAI-Agent avatar Jun 18 '24 09:06 QodoAI-Agent

Possible Bug: Ensure that the upsert_modelcard function handles cases where project language data might be missing or malformed. > The current implementation assumes that proj.vocab_lang is always available and valid.

Good point by the AI, but I think the project language is always set if this point is reached...?

juhoinkinen avatar Jun 18 '24 10:06 juhoinkinen

Persistent review updated to latest commit https://github.com/NatLibFi/Annif/commit/845f53d74fee07c94b7f97be5dbd73550eb4ef58

QodoAI-Agent avatar Sep 18 '24 13:09 QodoAI-Agent

I added an automatically updating Projects section to the modelcard, like this: https://huggingface.co/juhoinkinen/Annif-models-upload-testing#projects

juhoinkinen avatar Sep 18 '24 13:09 juhoinkinen