Model card support
Model card support proposal
Model cards are document summaries widely utilized, e.g. huggingface.co shows them as the default model view. While this style of model card can be rich in metadata, it's up to the authors of the document to include optional header fields in the repo README.md file. There are, however, more generic formats which utilize metadata to build the model card, such as Google model cards and IBM fact sheets.
Some Model Cards guidelines also suggest on using templates for HuggingFace model cards
And there are also reported user studies (about Model Cards from HuggingFace) suggesting how to structure one. Which further points in the direction of constructing a model card from metadata templates.
Possible implementations
To support model cards on Model Registry, it would be necessary to gather meta-metadata, so that parsing becomes possible. Possible approaches include:
-
naming conventions for fields
Pros:
- Very simple
Cons:
- those can lead to typos if there's only optional validation
- Any other validation would seem like magic, e.g. having mutually exclusive fields
- Also makes the API more obscure for starting users by having arbitrary field parsing
- Keeping the current metadata structure limits the structures we can both specify and validate
This could work if we split custom fields and parsed fields, but this creates a divide for useful metadata, or importing from "unknown" types that could be later supported.
-
restructuring metadata types to include required meta-metadata information, such as field category, or section
Pros:
- This type information can be used to specify arbitrary types to data, which can be very flexible.
- Would require minimal API changes, but a little more work for everyone to use.
Cons:
- custom types might require additional endpoints for registering and validating.
this might not solve the problem entirely, as there can be rules for the meta-metadata types, which would have to be validated dynamically, and possibly "discovered" by clients on runtime. e.g. user wants to register a vision model artifact under a multimodal model version, how do they know which fields are valid for that model before requesting?
-
templated metadata forms
having standard forms is feasible as long as they can still have arbitrary fields per "section". However this does not consider field validation rules e.g. for different model types like vision and text-to-speech. To account for that Model Registry could support metadata template registration (similar to MLMD type definitions).
Pros:
- Flexible and robust interface, server can almost fully understand metadata and provide (even more advanced) functionality
Cons:
- Hardest to implement on the UI
For the python client it would be possible to make template building blocks available for clients to specify their own types.
Proposed solution
It would be possible to iterate on these ideas progressively, starting with the simplest proposal idea. First, MR can only parse a single json-serialized field, generated by the client that wants a model card. The client parses its own metadata and sends a custom-format which could be iterated upon.
For a second step, MR would either need to completely abstract MLMD custom properties or switch the metadata store entirely, so this discussion will be left as WIP.
Thank you for this rich report @isinyaaa
About
And there are also reported user studies (about Model Cards from HuggingFace) suggesting how to structure one. Which further points in the direction of constructing a model card from metadata templates.
Is the link in the quote the expected one, please? It just points again to the template.
.
In general my view: I can't see there is a (one) "standard" Model Card format, and we should avoid overfitting to one existing solution (ie HF only). Do you share this pov?
.
Having a link to the original Model Card via DocArtifact when importing from HF, could make easier to preserve a way to display the "original" one. Similar when importing from other registries. i.e.: was in the original model registry a ModelCard which can be linked? If yes, provide a link to it, when importing into Model Registry. wdyt about this possibility?
.
In the case the model is indexed directly on (our) Model Registry, I believe the solution should remain flexible, while guiding the user about which information could be helpful to collect. This is what I like a lot about the template proposal.
From UI and model card generation pov we should not rely on hardcoded or naming convention for fields/subfields; I concur the potential solution "naming conventions for fields" is brittle in this regards, and I would not pursue it.
I also believe the potential solution "templated metadata forms" would support enterprise/user which has their own format for model card. I don't think we should be the one "imposing" one format or another. Is this a guiding principle we agree upon?
.
I'm not sure I fully understand why new type definition(s) would be needed in MLMD for templates; my intuition based on the report was that we can more simply dedicate either a custom artifact to hold a yaml representation for the model card, for those cases where the user didn't want to more simply just re-use the already existing UI blocks as a Model Card default rendering report, i.e.: something ~like (mockup)
In the survey/assessment, we shall also consider:
- model metadata from curated/maintained-in systems such as Llama stack, especially wrt to recommended parameters as it's precious metadata for inferencing/Model Serving
@tarilabs updated the HF user studies link, please refer to: huggingface.co/docs/hub/model-cards-user-studies#user-studies
@tarilabs updated the HF user studies link, please refer to: huggingface.co/docs/hub/model-cards-user-studies#user-studies
thanks, that and the remainder of the linked resources will be helpful to discuss with UXD
In general my view: I can't see there is a (one) "standard" Model Card format, and we should avoid overfitting to one existing solution (ie HF only). Do you share this pov?
Agreed!
Having a link to the original Model Card via DocArtifact when importing from HF, could make easier to preserve a way to display the "original" one. Similar when importing from other registries. i.e.: was in the original model registry a ModelCard which can be linked? If yes, provide a link to it, when importing into Model Registry. wdyt about this possibility?
Definitely sounds like a good idea for HF imports :)
In the case the model is indexed directly on (our) Model Registry, I believe the solution should remain flexible, while guiding the user about which information could be helpful to collect. This is what I like a lot about the template proposal.
From UI and model card generation pov we should not rely on hardcoded or naming convention for fields/subfields; I concur the potential solution "naming conventions for fields" is brittle in this regards, and I would not pursue it.
I also believe the potential solution "templated metadata forms" would support enterprise/user which has their own format for model card. I don't think we should be the one "imposing" one format or another. Is this a guiding principle we agree upon?
Yes, that's precisely the point I wanted to raise with the "templating" capabilities. My analogy to MLMD type definitions is in that MLMD enables users to define their own types, like we should be able to allow users to define their own "Model metadata forms", which end up being very similar in usage. Does that clarify your concern? Otherwise I'm not sure if I follow your last point about using the special yaml artifact type.
I would probably include ModelKit spec (KitOps) as option to consider.
In general my view: I can't see there is a (one) "standard" Model Card format, and we should avoid overfitting to one existing solution (ie HF only).
+1 We also need to make sure model registry doesn't overfit for GenAI and maybe makes mandatory some information that is specific for LLM
Personally, I like the idea to have some templating or even "types" to keep flexibility and support different formats without any strict format requirement/validation.
We should also consider metadata about input/output parameters. This is not covered by any of these spec as far as I see, probably mainly because this is runtime specific and usually OpenAI API "compatible" but we should consider extending this to these metadata (this has been done for predictive AI by KServe with Open-Inference-Protocol)
From Hugging Face take a look https://huggingface.co/spaces/huggingface/Model_Cards_Writing_Tool
From Hugging Face take a look https://huggingface.co/spaces/huggingface/Model_Cards_Writing_Tool
this looks nice, but it often gives me back 🤔 🤷:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.