[FEATURE]: Proposed changes for MLBOM schema for CycloneDX 2.0
Describe the feature
This issue will capture proposed changes and action items raised as part of the MLBOM work group towards improving the ML schema for CycloneDX 2.0
- fields in
modelParametersshould be moved to top-levelmodelCardschema - explore
releaseNotesbeing made plural to account for the same (identical) component being released simultaneously to different platforms/repositories (i.e., models released to HF, Ollama, etc.)- We agree "best practices" should be written on how to account for this use case on when/how to use multiple release notes and assure the component identity is the same...
architectureFamilyas a simplestringmay not be enough as there are now more commonly manyhybridarchitectures with a increasing rate of divergence (just assuming transformers and also taking into account models that include multiple models such as asmolDoclingortableFormermodel...- Consider if this field has value IF we actually allow a more precise desc. of the architectural "layers" inside the
modelArchitectureif redesigned to do so? - Additionally, consider things like
dense,sparse,moe, etc.
- Consider if this field has value IF we actually allow a more precise desc. of the architectural "layers" inside the
- Redesign
modelArchitectureto allow a description of the layers that compose the model. -> TODO @mrutkows modelParametersshould reflect # of "learned" parameters- Note: EU CRA says each parameter needs to be described??? Need more info. on this as for a BOM this would be unsupportable as well as impossible to derive from any scanning tool
taskneeds to be plural i.e.,tasks- TODO: does this need to be a string? an enum? more complex?
- NEW: Add
trainingConsiderationsto describe the training processes- TODO: more discussion/design needed
- TODO: Discuss reworking of
inputsandoutputs(bothstrings) is not clear... assume by the desc. that these are really (chat) template parameters (which may vary by template as models can have multiple) - TODO: Need to add a description of required (or fixed)
hyperparameters(e.g.,params.json)- Many models require certain params and will NOT work properly (invalid results) if not set properly (e.g., image model clip rects, guardrails models need
temperatureset to zero, etc.)
- Many models require certain params and will NOT work properly (invalid results) if not set properly (e.g., image model clip rects, guardrails models need
Additional considerations
- Extend
modelCardsto allow for similar, new concepts for "system cards" (system level usage if a model) and "agent cards" (models used in agentic instances) which are being adopted by model providers.
Check out BBQ and an example if "ethical" bias and accuracy measures: https://build.nvidia.com/nvidia/nvidia-nemotron-nano-9b-v2/modelcard
and specifically BBQ project here: https://github.com/nyu-mll/BBQ
Recap of Meeting on Oct 1, 2025
Overview:
- Discussed parameters: We do not list all parameters for a model. Instead, we ask Model PICs to list the total number of parameters.
- Units are:
- Million - m
- Billion - B
- Trillion - T
- Input/Output: Describes tensor shapes
- Hyperparameters - Temperature
Bias: -Ethical Considerations - specific to bias
- Bias Metric - Do we have consensus around specific evals? For LLMs, we use BBQ (developed by Google) and we point to BBQ here. Example: See bias subcard: nvidia-nemotron-nano-9b-v2 Model by NVIDIA | NVIDIA NIM -Greatest difference in performance: Here, we are referring to accuracy -Matt mentioned performance tradeoffs, but that this category seems unclear
Explainability: -Most of the responses here are arrays of strings -Known Risks, Intended Users, and Technical Limitations may not yet be covered.
Next Steps: -Discuss Privacy and Safety subcards -Action Items (to discuss next meeting): -Matt to take a look at Train/Test/Eval section -Michael to help develop taxonomy recommendations
Links: -Cyclone DX components: https://cyclonedx.org/docs/1.6/json/#components_items_type -BBQ repo: GitHub - nyu-mll/BBQ: Repository for the Bias Benchmark for QA dataset.